I’ve finetuned llama2 on a custom dataset following the blogpost . Post finetuning it gets deployed on sagemaker endpoints but when I run inference it throws could not load model. Error: ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400…

To anyone else facing this problem, it works totally fine on a plain old EC2 instance with TGI v1.0.0. Which would be because text generation interface added support for Llama2 in v0.9.3 while sagemaker python sdk only recognises upto 0.8.2. I used g4dn.12xlarge instance.

Error loading finetuned llama2 model while running inference

Amazon SageMaker

guiba44 August 2, 2023, 7:46am 4

Hi there, I have the same error, using the same code for deployment. I am not able to run inference on the endpoint deployed on sagemaker after fine tuning. Model I fine tuned is Llama-2-13b-hf.

1 Like

Topic		Replies	Views
ValueError: Could not load model /opt/ml/model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>) Amazon SageMaker	0	405	March 13, 2024
QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image Amazon SageMaker	14	3023	August 18, 2023
Error hosting endpoint when deploying model Amazon SageMaker	2	3108	March 27, 2024
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2207	April 4, 2023
Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error Amazon SageMaker	29	6935	January 8, 2024

Error loading finetuned llama2 model while running inference

Related topics