I’m trying to deploy my finetuned LLama3 model on Aws, so the first step to create an endpoint
I used instance_type=“ml.g5.4xlarge”,
This is my code:
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client(‘iam’)
role = iam.get_role(RoleName=‘sagemaker_execution_role’)[‘Role’][‘Arn’]
Hub Model configuration. https://huggingface.co/models
hub = {
‘HF_MODEL_ID’:‘Guepard/knaine_llama3.1_v0’,
‘HF_TASK’: ‘text-generation’
}
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri(“huggingface”,version=“2.0.2”),
env=hub,
role=role,
)
deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=“ml.g5.4xlarge”,
container_startup_health_check_timeout=1200,
)
send request
predictor.predict({
“inputs”: “My name is Julien and I like to”,
})
I get the following error :
UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2024-08-23-08-25-25-823: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint… Try changing the instance type or reference the troubleshooting page Troubleshooting - Amazon SageMaker
Any solution?