Error when deploying GPT4-Alpaca on Sagemaker via HF model hub

magicsquares137 · May 17, 2023, 3:06am

Hello, thanks for reading. I am having many issues deploying LLM models on sagemaker. I have been able to get the canned AWS foundation models deployed, but when I try to use one off of HF hub I always get a similar error. Here is the error I am getting trying to deploy anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.12xlarge' # ec2 instance type
)

The endpoint deploys successfully, but when querying the endpoint I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

The cloud watch logs don’t provide any useful information that I can see. Any thoughts?

onebigchildren · May 20, 2023, 2:55am

Curious to know if you found a fix

magicsquares137 · May 20, 2023, 3:12am

not a fix, but an explanation. the current sagemaker instances dont support the transformers version the llama models were trained on, so cant support them in general. i dont think there is a fix, at least that i have been able to find

tridungduong16 · May 27, 2023, 11:37am

I have faced the same problems. Does any one know how to fix?

finnhowell · May 31, 2023, 8:43pm

Any fix for this yet?

ozcode · June 7, 2023, 10:28am

@philschmid answered my question which is similar to this issue. His suggested solution is this and it worked for me:

can you try to use the new LLM container? Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

philschmid · June 7, 2023, 12:17pm

This will not work for gpt4-x-alpaca-13b-native-4bit-128g since it requires the GPTQ package. Therefore you need to create a custom infernece.py script and add the latest transforemrs version + gptq with a requirements.txt

magicsquares137 · June 8, 2023, 2:55am

this didnt work for me, i got an error in cloud logs like “huggingface_hub.utils._errors.LocalEntryNotFoundError: File aining_args.safetensors of model ehartford/Wizard-Vicuna-13B-Uncensored not found in /tmp. Please run text-generation-server download-weights ehartford/Wizard-Vicuna-13B-Uncensored first.”

i did deploy the model in a hacky way using a pure ec2 instance, fastapi, llamacpp, and nginx

sj26 · July 11, 2023, 12:51am

Try a newer version of "transformers_version " compatible with your model:

Maybe your model needs more than this, but it’s a useful breadcrumb for folks with similar issues.

Topic		Replies	Views
Error hosting endpoint when deploying model Amazon SageMaker	2	3024	March 27, 2024
Getting error in the inference stage of Transformers Model (Hugging Face) 🤗Transformers	0	782	October 11, 2022
Error deploying endpoint on Aws Models	6	197	August 23, 2024
Vicuan error on Sagemaker Amazon SageMaker	3	829	October 23, 2024
Error code 400 when running llama2 on sagemaker endpoint Amazon SageMaker	1	1220	July 24, 2023

Error when deploying GPT4-Alpaca on Sagemaker via HF model hub

Related topics