Calling Sagemaker Endpoint for fine-tuned summarization model

drcat101 · July 5, 2022, 8:43am

Hi HuggingFace community,

I’m attempting to deploy a fine-tuned T5 model for summarization using a SageMaker Endpoint. The endpoint is deployed successfully with the following code:

from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
   model_data="s3://my-s3-path/model.tar.gz",
   role=role,
   transformers_version="4.6",
   pytorch_version="1.7",
   py_version='py36',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge",
    endpoint_name="my-endpoint-name"
)

I then try to call the endpoint:

from sagemaker.huggingface.model import HuggingFacePredictor

predictor = HuggingFacePredictor(endpoint_name="my-endpoint-name", sagemaker_session=sess)

predictor.predict({'inputs': 'this is a string',
                  'parameters': {'max_length': 20,
                                'min_length': 1}
                  })

And I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "(\"You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.\", 403)"
}

Can anyone tell me where I should define that I want a summarization model? I can’t find anything in the docs to tell me this. Thanks!

philschmid · July 5, 2022, 1:09pm

Hello @drcat101,

It looks like the model.tar.gz is malformed. How have you created it? You can check out the documentation: Deploy models to Amazon SageMaker

drcat101 · July 5, 2022, 1:50pm

Thanks @philschmid for your comment, but that wasn’t the problem in this case, the model.tar.gz was fine.

I just found the solution - I needed to add an extra parameter to the model like so:

env = {'HF_TASK': 'summarization'}

huggingface_model = HuggingFaceModel(
   model_data="s3://my-s3-path/model.tar.gz",
   role=role,
    env=env,
   transformers_version="4.6",
   pytorch_version="1.7",
   py_version='py36',
)

This is not in the documentation anywhere for fine-tuned models, but does appear in the tests for the inference package here: sagemaker-huggingface-inference-toolkit/test_models_from_hub.py at main · aws/sagemaker-huggingface-inference-toolkit · GitHub

philschmid · July 5, 2022, 2:15pm

Interesting! Thanks for following up, normally if you train the model in sagemaker and then deploy the task should be detected automatically. can you share the config.json inside you model.tar.gz?

drcat101 · July 5, 2022, 2:50pm

Here’s the config.json. I don’t think I specified the task anywhere in the fine-tuning process, but it works…

philschmid · July 5, 2022, 3:08pm

This should normally detect the text2text-generation pipeline. Can you please check that your model is loaded and not the default one from transformers? Meaning if you just provide a task in the pipeline transformers will load a default model.
Also, what is the structure of your archive/how have you created it?

drcat101 · July 5, 2022, 3:18pm

Yes, my model is loaded and not the default model, the archive is fine, and everything is now working great. I followed the steps in the O’Reilly book with the Pegasus example.

Pidem · February 22, 2023, 8:26am

I have a similar error: I am using a custom inference.py file with a custom model_fn.
My model is multi_label_classification model and I have tried to pass the problem type as an ENV variable: env = {'HF_TASK': 'multi_label_classification'} but no luck.

Any clue?

philschmid · February 22, 2023, 8:43am

Hello @Pidem,

when you implement a custom inference.py you don’t have to pass any ENV.

Pidem · February 22, 2023, 1:29pm

I tried this as well but that didn’t work. How can I make sure that the inference.py gets picked up ? It is in the code/ folder in my case.

philschmid · February 22, 2023, 2:28pm

How have you created your inference.py / created the model.tar.gz?
Here is an example: Deploy FLAN-T5 XXL on Amazon SageMaker
maybe that helps

Pidem · February 22, 2023, 4:04pm

Yes, my inference.py file is in my model.tar.gz:


from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Helper: Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def model_fn(model_dir):
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModel.from_pretrained(model_dir)
    return model, tokenizer

def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences
    sentences = data.pop("inputs", data)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
    
    # return dictionary, which will be json serializable
    return [{"vectors": sentence_embeddings[i].tolist()} for i in range(len(sentence_embeddings))]

and my model.tar.gz folder structure is:

code/inference.py
pytorch_model.bin
tokenizer.json

my config.json file says: problem_type:"multi_label_classification"
and here is the error message I am seeing

Pidem · February 22, 2023, 8:05pm

Found the error. In case someone runs into the same error in the future: I had renamed the model.tar.gz to model_inference.tar.gz and apparently that creates an error

sahilshaikh89 · June 15, 2023, 10:44pm

After breaking my head on this for one whole day. The issue is with how its suggested to generate the tar file.
When I created the tar file using below, I got the error.
!tar -zcvf model.tar.gz

However, generating tar using below worked like a charm

import tarfile
import os

# helper to create the model.tar.gz
def compress(tar_dir=None,output_file="model.tar.gz"):
    parent_dir=os.getcwd()
    os.chdir(tar_dir)
    with tarfile.open(os.path.join(parent_dir, output_file), "w:gz") as tar:
        for item in os.listdir('.'):
          print(item)
          tar.add(item, arcname=item)
    os.chdir(parent_dir)

compress(str(model_id))

yan-wong · February 1, 2024, 6:53am

When you created tarfile with “tar -zcvf model.tar.gz”, you just include the “model” directory in the target file. This would incur this error. You may try something like “tar -zcvf model.tar.gz -C model .” to exclude the “model/” directory itself but only compress files inside it.

yapaydotai · March 22, 2024, 1:20pm

Yeah you’re right. I’ve tested that. It works now. Thank you for sharing.

Topic		Replies	Views
Modelerror when deploying openchat3.5 Amazon SageMaker	0	223	April 2, 2024
Getting error in the inference stage of Transformers Model (Hugging Face) 🤗Transformers	0	781	October 11, 2022
Endpoint Deployment Amazon SageMaker	1	1108	September 20, 2021
ModelError when I run predict after deploying wizardcoder for text-generation Amazon SageMaker	1	926	September 25, 2023
ModelError when I run predict after deploying gpt-j for question answering Amazon SageMaker	4	1316	February 28, 2023

Calling Sagemaker Endpoint for fine-tuned summarization model

Related topics