Error deploying endpoint on Aws

I’m trying to deploy my finetuned LLama3 model on Aws, so the first step to create an endpoint
I used instance_type=“ml.g5.4xlarge”,
This is my code:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client(‘iam’)
role = iam.get_role(RoleName=‘sagemaker_execution_role’)[‘Role’][‘Arn’]

Hub Model configuration. https://huggingface.co/models

hub = {
‘HF_MODEL_ID’:‘Guepard/knaine_llama3.1_v0’,
‘HF_TASK’: ‘text-generation’
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri(“huggingface”,version=“2.0.2”),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=“ml.g5.4xlarge”,
container_startup_health_check_timeout=1200,
)

send request

predictor.predict({
“inputs”: “My name is Julien and I like to”,
})

I get the following error :

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2024-08-23-08-25-25-823: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint… Try changing the instance type or reference the troubleshooting page Troubleshooting - Amazon SageMaker

Any solution?

hi @Knour13
Did you check CloudWatch logs?

I found in cloud watch these errors:


@philschmid Hi Do you have any solution ?
Thanks

Try TGI 2.2.0

Same error


This is cloud watch