Vicuan error on Sagemaker

Hi folks,

I have been trying to deploy TheBloke/vicuna-7B-1.1-HF to SageMaker but with no luck. I have not had problems with other models like bloom-3b. I used the following code to deploy:

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'TheBloke/vicuna-7B-1.1-HF',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.26.0',
	pytorch_version='1.13.1',
	py_version='py39',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.2xlarge' # ec2 instance type
)

predictor.predict({
	"inputs": "Can you please let us know more details about your ",
})

However, I am getting the following error when predicting:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Any help is appreciated.

can you try to use the new LLM container? Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

1 Like

Thank you! It works with this new container.

@philschmid

I am using following code

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
import boto3
sess = sagemaker.Session()

sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
sagemaker_session_bucket = sess.default_bucket()

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client(‘iam’)
role = iam.get_role(RoleName=‘sagemaker_execution_role’)[‘Role’][‘Arn’]

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

hub = {
‘HF_MODEL_ID’:‘model id’,
‘HF_TASK’:‘text-classification’,
‘HF_TOKEN’:‘token’
}

huggingface_model = HuggingFaceModel(
transformers_version=‘4.45.2’,
pytorch_version=‘2.2.0’,
py_version=‘py310’,
env=hub,
role=role,
)

predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type=‘ml.m5.xlarge’, # ec2 instance type
endpoint_name = ‘classifier’
)

predictor.predict({
“inputs”: “I like you. I love you”,
})

But it is giving me error as follows
ValueError: Unsupported huggingface version: 4.45.2. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.6.1, 4.10.2, 4.11.0, 4.12.3, 4.17.0, 4.26.0, 4.28.1, 4.37.0, 4.6, 4.10, 4.11, 4.12, 4.17, 4.26, 4.28, 4.37.

I am using sagemaker jupyter notebook to deploy with sagemaker version 2.23.2