Error code 400 when running llama2 on sagemaker endpoint

admangan4400 · July 21, 2023, 12:57pm

Followed @philschmid blog on finetuning and then deployed the model to the endpoint with the below code and it returned the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Deployment Code:

from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_s3,  # Change to your model path
   role=role,
   transformers_version="4.26",
   pytorch_version="1.13",
   py_version="py39",
   model_server_workers=1
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type= "ml.g5.2xlarge",
    endpoint_name='llama2-7b-1'
)

philschmid · July 24, 2023, 7:19am

To deploy llama you should use the new LLM container: Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Topic		Replies	Views
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4804	September 20, 2023
Llava endpoint on Sagemaker Amazon SageMaker	0	159	May 10, 2024
Vicuan error on Sagemaker Amazon SageMaker	3	832	October 23, 2024
Error hosting endpoint when deploying model Amazon SageMaker	2	3038	March 27, 2024
ValueError: Could not load model /opt/ml/model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>) Amazon SageMaker	0	391	March 13, 2024

Error code 400 when running llama2 on sagemaker endpoint

Related topics