Calling Sagemaker Endpoint for fine-tuned summarization model

Hi HuggingFace community,

I’m attempting to deploy a fine-tuned T5 model for summarization using a SageMaker Endpoint. The endpoint is deployed successfully with the following code:

from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
   model_data="s3://my-s3-path/model.tar.gz",
   role=role,
   transformers_version="4.6",
   pytorch_version="1.7",
   py_version='py36',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge",
    endpoint_name="my-endpoint-name"
)

I then try to call the endpoint:

from sagemaker.huggingface.model import HuggingFacePredictor

predictor = HuggingFacePredictor(endpoint_name="my-endpoint-name", sagemaker_session=sess)

predictor.predict({'inputs': 'this is a string',
                  'parameters': {'max_length': 20,
                                'min_length': 1}
                  })

And I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "(\"You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.\", 403)"
}

Can anyone tell me where I should define that I want a summarization model? I can’t find anything in the docs to tell me this. Thanks!

Hello @drcat101,

It looks like the model.tar.gz is malformed. How have you created it? You can check out the documentation: Deploy models to Amazon SageMaker

Thanks @philschmid for your comment, but that wasn’t the problem in this case, the model.tar.gz was fine.

I just found the solution - I needed to add an extra parameter to the model like so:

env = {'HF_TASK': 'summarization'}

huggingface_model = HuggingFaceModel(
   model_data="s3://my-s3-path/model.tar.gz",
   role=role,
    env=env,
   transformers_version="4.6",
   pytorch_version="1.7",
   py_version='py36',
)

This is not in the documentation anywhere for fine-tuned models, but does appear in the tests for the inference package here: sagemaker-huggingface-inference-toolkit/test_models_from_hub.py at main · aws/sagemaker-huggingface-inference-toolkit · GitHub

Interesting! Thanks for following up, normally if you train the model in sagemaker and then deploy the task should be detected automatically. can you share the config.json inside you model.tar.gz?

Here’s the config.json. I don’t think I specified the task anywhere in the fine-tuning process, but it works…

This should normally detect the text2text-generation pipeline. Can you please check that your model is loaded and not the default one from transformers? Meaning if you just provide a task in the pipeline transformers will load a default model.
Also, what is the structure of your archive/how have you created it?

Yes, my model is loaded and not the default model, the archive is fine, and everything is now working great. I followed the steps in the O’Reilly book with the Pegasus example.

1 Like