Error loading finetuned llama2 model while running inference

I get the same error here for CodeLlama-7b-hf-Instruct.

I even included a requirements.txt file in model.tar.gz requiring a transformers==4.33.2 but it doesnt work.
Any ideas?

Do you mean transformers==4.33.2?

1 Like

@jeremydd yup I fined tuned llama-7b on my own data and deploy on Sagemaker.

Hello @Mit1208 ! I have tried to deploy the finetuned Llama-7b with my own data, with TGI v1.0 as you mentioned and it still gives me the error

“FileNotFoundError: No local weights found in /opt/ml/model with extension .bin”

Could you give me more details on how you managed to deploy it in Sagemaker? Thanks!

Hi @cnicu,
I was getting same error when I didn’t specified my model path. I used spot instance while training so my checkpoints were on the S3.

config = {
'HF_MODEL_ID': '/opt/ml/model', # path to where sagemaker stores the model
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(1024),  # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(2048),  # Max length of the generation (including input text),

# create HuggingFaceModel
llm_model = HuggingFaceModel(

# Deploy model to an endpoint
llm = llm_model.deploy(

Here model_data is URI of s3 checkpoint. Make sure you have a model at S3 URI path.

1 Like

maybe try using this:

llm_image = get_huggingface_llm_image_uri(

it worked for me.

Thank you! Worked for me!

Could you tell me which instance I would need if I wanted to deploy llama2-13b finetuned? With the same configuration that I have managed to deploy llama2-7b, I cannot do it with 13b. I have tried with the ml.g5.8xlarge instance and neither.

try with ml.g5.12xlarge according to AWS guide.