Error when deploying GPT4-Alpaca on Sagemaker via HF model hub

Hello, thanks for reading. I am having many issues deploying LLM models on sagemaker. I have been able to get the canned AWS foundation models deployed, but when I try to use one off of HF hub I always get a similar error. Here is the error I am getting trying to deploy anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration.
hub = {

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.12xlarge' # ec2 instance type

The endpoint deploys successfully, but when querying the endpoint I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"

The cloud watch logs don’t provide any useful information that I can see. Any thoughts?

Curious to know if you found a fix

not a fix, but an explanation. the current sagemaker instances dont support the transformers version the llama models were trained on, so cant support them in general. i dont think there is a fix, at least that i have been able to find

I have faced the same problems. Does any one know how to fix?