Error hosting endpoint when deploying model

tiagosilva · September 6, 2023, 5:07pm

I’m trying to deploy the meta-llama/Llama-2-70b-hf using sagemaker, but I’m getting an error I don’t understand.

I created a jupyter notebook on sagemaker, ml.g5.2xlarge 256 Gb, and later one with 1024 GB.

Copied the deploy to sagemaker script from huggin face and replaced my token:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'meta-llama/Llama-2-70b-hf',
	'SM_NUM_GPUS': json.dumps(1),
	'HUGGING_FACE_HUB_TOKEN': '<REPLACE WITH YOUR TOKEN>'
}

assert hub['HUGGING_FACE_HUB_TOKEN'] != '<REPLACE WITH YOUR TOKEN>', "You have to provide a token."

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge",
	container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
	"inputs": "My name is Julien and I like to",
})

I get the following error:
UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-09-06-16-46-01-586: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

the logs for the endpoints just show this:

EDIT: for clarity the last 2 log entry read:

#033[2m2023-09-06T16:59:07.045706Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download: [9/15] -- ETA: 0:05:43.333332
#033[2m2023-09-06T16:59:07.045945Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download file: model-00010-of-00015.safetensors

It seems to be just stopping, and I don’t see a reason why.
Any help is greatly appreciated.

philschmid · September 11, 2023, 8:39am

You cannot deploy 70B on g5.2xlarge instance, see: Deploy Llama 2 7B/13B/70B on Amazon SageMaker

ahsanr · March 27, 2024, 8:15pm

how can I deploy a Quantized model on aws,
this model:

for this scenario:
Deploy this TheBloke/vicuna-13B-v1.5-GGUF model on AWS

I want to use this model as an endpoint in my web application in this format:

Chatbot Requirements

Scope: Chatbot (Encoder/Decoder for Text Inference or Conversational)
Input via API (JSON): Chatgpt Style – The template can be see belowThe JSON will contain 25 user messages, and the response should be the system response.
Please use this guidelines to understand API consumption: InvokeEndpoint - Amazon SageMaker
Prompt Template for the system:
a. template = ‘’’
You are going to be my education assistant.
System:{System}
Question:{question}‘’’
LLM Model Parameters: max_new_tokens=512, temperature=0.7, top_p=0.9
If possible use a AutomodelforCausalLM otherwise train a LLM model.
It will be deployed on AWS Sagemaker using S3 buckets.
The GGUF should be saved on a S3 Bucket.
Chat Buffer should store 25 conversations and create a session ID (No need to send this to the End point).
Use HuggingFace/Langchain when possible.
Deliverables: Jupyter notebook/Code – 2 Hours should be used to set up the model in AWS with the customer.

Provide me with complete source code that I can use in my jupyter notebook on aws to make an endpoint.

Topic		Replies	Views
Error hosting endpoint when deploying model in sagemaker Models	0	90	July 20, 2024
Error deploying endpoint on Aws Models	6	208	August 23, 2024
Deploying TheBloke/Luna-AI-Llama2-Uncensored-GGML Amazon SageMaker	0	844	September 11, 2023
Deployment issue on Sagemaker Amazon SageMaker	16	3317	October 4, 2023
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4803	September 20, 2023

Error hosting endpoint when deploying model

Related topics