Hi there,
I’m having a problem when invoking a pagemaker endpoint for inference from a hugging face model created from a s3 bucket.
When I create an endpoint from the hub it works works normally and I can run my inference code successfully. However, when I try to create an endpoint for the same model when stored in s3 it returns an error during inference time.
First I download the bart-large-mnli model, compress into a tar.gz and upload to my s3 bucket as described here.
git lfs install
git clone git@hf.co:{repository}
cd {repository}
tar zcvf model.tar.gz *
aws s3 cp model.tar.gz <s3://{my-s3-path}>
Then I use the following code to create an endpoint from the model stored in a s3 bucket
role = sagemaker.get_execution_role()
env = {'HF_TASK': 'summarization'}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data="s3://<my-bucket-name>/model.tar.gz", # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
env=env,
transformers_version="4.26", # Transformers version used
pytorch_version="1.13", # PyTorch version used
py_version='py39', # Python version used
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge")
The deployment works, but when I try to invoke the model for inference using the following code:
sess = sagemaker.Session()
predictor = HuggingFacePredictor(endpoint_name=ENDPOINT, sagemaker_session=sess)
data = {
"inputs": "I have a problem with my iphone that needs to be resolved asap!!",
"parameters": {
"candidate_labels": [
"urgent",
"phone",
"computer",
"tablet",
"not urgent"],
"multi_label": True
}
}
print(predictor.predict(data=data))
I receive the following error:
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "The following `model_kwargs` are not used by the model: [\u0027candidate_labels\u0027, \u0027multi_label\u0027] (note: typos in the generate arguments will also show up in this list)"
}
Using the same inference code pointing to the endpoint created from the hub, as below, described in the deployment on the model’s page, works normally.
hub = {
'HF_MODEL_ID':'facebook/bart-large-mnli',
'HF_TASK':'zero-shot-classification'
}
Thank you for any help.