Unable to deploy to SageMaker via Studio notebook

dualblades · August 24, 2023, 6:01pm

Kernel specifications:

Image: Data Science 3.0
Kernel: Python 3
Instance type: ml.t3.medium
Start-up script: No script

This is my exact notebook code, copied from the “Deploy” button on https://huggingface.co/HuggingFaceM4/idefics-80b:

import sagemaker
from sagemaker.huggingface import HuggingFaceModel

role = sagemaker.get_execution_role()

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'HuggingFaceM4/idefics-80b',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.26.0',
	pytorch_version='1.13.1',
	py_version='py39',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

data = {
 "inputs": "Can you please let us know more details about your "
}
predictor.predict(data)

I am able to deploy the model and I can see the endpoint. However, running the predict method always throws this error:

ModelError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 predictor.predict(data)

File /opt/conda/lib/python3.10/site-packages/sagemaker/base_predictor.py:185, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes)
    138 """Return the inference from the specified endpoint.
    139 
    140 Args:
   (...)
    174         as is.
    175 """
    177 request_args = self._create_request_args(
    178     data,
    179     initial_args,
   (...)
    183     custom_attributes,
    184 )
--> 185 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    186 return self._handle_response(response)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    531     raise TypeError(
    532         f"{py_operation_name}() only accepts keyword arguments."
    533     )
    534 # The "self" in this scope is referring to the BaseClient.
--> 535 return self._make_api_call(operation_name, kwargs)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params)
    978     error_code = parsed_response.get("Error", {}).get("Code")
    979     error_class = self.exceptions.from_code(error_code)
--> 980     raise error_class(parsed_response, operation_name)
    981 else:
    982     return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027idefics\u0027"
}
".

What can I do to fix this issue and properly invoke the endpoint?

philschmid · October 12, 2023, 2:20pm

Here is an example on how to deploy it Deploy Idefics 9B & 80B on Amazon SageMaker

Topic		Replies	Views
ModelError when I run predict after deploying wizardcoder for text-generation Amazon SageMaker	1	929	September 25, 2023
Error hosting endpoint when deploying model in sagemaker Models	0	90	July 20, 2024
Deploying TheBloke/Luna-AI-Llama2-Uncensored-GGML Amazon SageMaker	0	844	September 11, 2023
Error hosting endpoint when deploying model Amazon SageMaker	2	3042	March 27, 2024
ModelError when deploying google/flan-t5-xl Amazon SageMaker	1	448	July 31, 2023

Unable to deploy to SageMaker via Studio notebook

Related topics