Issue in deploying quantized meta-llama/Llama-3.1-8B-Instruct in aws sagemaker

Manju2050 · October 10, 2024, 10:09pm

I have used 4 bit quantization through bitsandbytes for meta-llama/Llama-3.1-8B-Instruct and tested using the following ecr images:

huggingface-pytorch-tgi-inference:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04-v1.0
pytorch-inference
763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.4.0-gpu-py311-cu124-ubuntu22.04-sagemaker

The code works fine locally. However, when tried deploying this on sagemaker endpoint, i see the following container crash logs :
i do see container crash logs again for both the ecr images:
2024-10-10T19:22:25,964 [WARN ] W-9003-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.
2024-10-10T19:22:25,968 [INFO ] W-9003-model_1.0-stdout MODEL_LOG - File “/opt/ml/model/code/inference.py”, line 2, in
2024-10-10T19:22:25,968 [INFO ] W-9003-model_1.0-stdout MODEL_LOG - from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
2024-10-10T19:22:25,968 [INFO ] W-9003-model_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named ‘transformers’

For your reference, the model.tar.gz would look like below:
model.tar.gz

|- model artifacts
|- code/
|- inference.py # Your inference script
|- requirements.txt # Optional, used to install additional dependencies (if supported by your framework version)

Also the requirements.txt as follows:

transformers>=4.45
accelerate==0.34.2
bitsandbytes==0.44.1
peft==0.13.1

This is strange issue as i can see it working locally with a custom docker image built on top of the above mentioned ECR images. May i know if there is a resolution for this or if I’m missing anything?

Topic		Replies	Views
Error loading tokenizer: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3 🤗Tokenizers	3	3891	October 10, 2024
QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image Amazon SageMaker	14	2990	August 18, 2023
How to deploy quantized Mixtral 8x7b from Sagemaker? Amazon SageMaker	0	972	December 21, 2023
Deployment issue on Sagemaker Amazon SageMaker	16	3339	October 4, 2023
Text-generation-inference: "You are using a model of type llama to instantiate a model of type ." Models	5	7601	November 3, 2023

Issue in deploying quantized meta-llama/Llama-3.1-8B-Instruct in aws sagemaker

Related topics