Error code 400 when running llama2 on sagemaker endpoint

Followed @philschmid blog on finetuning and then deployed the model to the endpoint with the below code and it returned the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Deployment Code:

from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_s3,  # Change to your model path
   role=role,
   transformers_version="4.26",
   pytorch_version="1.13",
   py_version="py39",
   model_server_workers=1
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type= "ml.g5.2xlarge",
    endpoint_name='llama2-7b-1'
)
1 Like

To deploy llama you should use the new LLM container: Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

1 Like