Error when deploying GPT4-Alpaca on Sagemaker via HF model hub

Hello, thanks for reading. I am having many issues deploying LLM models on sagemaker. I have been able to get the canned AWS foundation models deployed, but when I try to use one off of HF hub I always get a similar error. Here is the error I am getting trying to deploy anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.12xlarge' # ec2 instance type
)

The endpoint deploys successfully, but when querying the endpoint I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

The cloud watch logs don’t provide any useful information that I can see. Any thoughts?

1 Like

Curious to know if you found a fix

not a fix, but an explanation. the current sagemaker instances dont support the transformers version the llama models were trained on, so cant support them in general. i dont think there is a fix, at least that i have been able to find

I have faced the same problems. Does any one know how to fix?

Any fix for this yet?

@philschmid answered my question which is similar to this issue. His suggested solution is this and it worked for me:

can you try to use the new LLM container? Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

This will not work for gpt4-x-alpaca-13b-native-4bit-128g since it requires the GPTQ package. Therefore you need to create a custom infernece.py script and add the latest transforemrs version + gptq with a requirements.txt

1 Like

this didnt work for me, i got an error in cloud logs like “huggingface_hub.utils._errors.LocalEntryNotFoundError: File aining_args.safetensors of model ehartford/Wizard-Vicuna-13B-Uncensored not found in /tmp. Please run text-generation-server download-weights ehartford/Wizard-Vicuna-13B-Uncensored first.”

i did deploy the model in a hacky way using a pure ec2 instance, fastapi, llamacpp, and nginx

Try a newer version of "transformers_version " compatible with your model:

Maybe your model needs more than this, but it’s a useful breadcrumb for folks with similar issues.