Kernel specifications:
Image: Data Science 3.0
Kernel: Python 3
Instance type: ml.t3.medium
Start-up script: No script
This is my exact notebook code, copied from the “Deploy” button on https://huggingface.co/HuggingFaceM4/idefics-80b:
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'HuggingFaceM4/idefics-80b',
'HF_TASK':'text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.26.0',
pytorch_version='1.13.1',
py_version='py39',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
data = {
"inputs": "Can you please let us know more details about your "
}
predictor.predict(data)
I am able to deploy the model and I can see the endpoint. However, running the predict
method always throws this error:
ModelError Traceback (most recent call last)
Cell In[17], line 1
----> 1 predictor.predict(data)
File /opt/conda/lib/python3.10/site-packages/sagemaker/base_predictor.py:185, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes)
138 """Return the inference from the specified endpoint.
139
140 Args:
(...)
174 as is.
175 """
177 request_args = self._create_request_args(
178 data,
179 initial_args,
(...)
183 custom_attributes,
184 )
--> 185 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
186 return self._handle_response(response)
File /opt/conda/lib/python3.10/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
531 raise TypeError(
532 f"{py_operation_name}() only accepts keyword arguments."
533 )
534 # The "self" in this scope is referring to the BaseClient.
--> 535 return self._make_api_call(operation_name, kwargs)
File /opt/conda/lib/python3.10/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params)
978 error_code = parsed_response.get("Error", {}).get("Code")
979 error_class = self.exceptions.from_code(error_code)
--> 980 raise error_class(parsed_response, operation_name)
981 else:
982 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "\u0027idefics\u0027"
}
".
What can I do to fix this issue and properly invoke the endpoint?