Vicuan error on Sagemaker

Hi folks,

I have been trying to deploy TheBloke/vicuna-7B-1.1-HF to SageMaker but with no luck. I have not had problems with other models like bloom-3b. I used the following code to deploy:

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration.
hub = {

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.2xlarge' # ec2 instance type

	"inputs": "Can you please let us know more details about your ",

However, I am getting the following error when predicting:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"

Any help is appreciated.

can you try to use the new LLM container? Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

1 Like

Thank you! It works with this new container.