Error when deploying GPT4-Alpaca on Sagemaker via HF model hub

Hello, thanks for reading. I am having many issues deploying LLM models on sagemaker. I have been able to get the canned AWS foundation models deployed, but when I try to use one off of HF hub I always get a similar error. Here is the error I am getting trying to deploy anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.12xlarge' # ec2 instance type
)

The endpoint deploys successfully, but when querying the endpoint I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

The cloud watch logs don’t provide any useful information that I can see. Any thoughts?

Curious to know if you found a fix

not a fix, but an explanation. the current sagemaker instances dont support the transformers version the llama models were trained on, so cant support them in general. i dont think there is a fix, at least that i have been able to find

I have faced the same problems. Does any one know how to fix?