Calling Sagemaker Endpoint for fine-tuned summarization model

Hi HuggingFace community,

I’m attempting to deploy a fine-tuned T5 model for summarization using a SageMaker Endpoint. The endpoint is deployed successfully with the following code:

from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
   model_data="s3://my-s3-path/model.tar.gz",
   role=role,
   transformers_version="4.6",
   pytorch_version="1.7",
   py_version='py36',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge",
    endpoint_name="my-endpoint-name"
)

I then try to call the endpoint:

from sagemaker.huggingface.model import HuggingFacePredictor

predictor = HuggingFacePredictor(endpoint_name="my-endpoint-name", sagemaker_session=sess)

predictor.predict({'inputs': 'this is a string',
                  'parameters': {'max_length': 20,
                                'min_length': 1}
                  })

And I get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "(\"You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.\", 403)"
}

Can anyone tell me where I should define that I want a summarization model? I can’t find anything in the docs to tell me this. Thanks!

1 Like

Hello @drcat101,

It looks like the model.tar.gz is malformed. How have you created it? You can check out the documentation: Deploy models to Amazon SageMaker

Thanks @philschmid for your comment, but that wasn’t the problem in this case, the model.tar.gz was fine.

I just found the solution - I needed to add an extra parameter to the model like so:

env = {'HF_TASK': 'summarization'}

huggingface_model = HuggingFaceModel(
   model_data="s3://my-s3-path/model.tar.gz",
   role=role,
    env=env,
   transformers_version="4.6",
   pytorch_version="1.7",
   py_version='py36',
)

This is not in the documentation anywhere for fine-tuned models, but does appear in the tests for the inference package here: sagemaker-huggingface-inference-toolkit/test_models_from_hub.py at main · aws/sagemaker-huggingface-inference-toolkit · GitHub

1 Like

Interesting! Thanks for following up, normally if you train the model in sagemaker and then deploy the task should be detected automatically. can you share the config.json inside you model.tar.gz?

Here’s the config.json. I don’t think I specified the task anywhere in the fine-tuning process, but it works…

This should normally detect the text2text-generation pipeline. Can you please check that your model is loaded and not the default one from transformers? Meaning if you just provide a task in the pipeline transformers will load a default model.
Also, what is the structure of your archive/how have you created it?

Yes, my model is loaded and not the default model, the archive is fine, and everything is now working great. I followed the steps in the O’Reilly book with the Pegasus example.

1 Like

I have a similar error: I am using a custom inference.py file with a custom model_fn.
My model is multi_label_classification model and I have tried to pass the problem type as an ENV variable: env = {'HF_TASK': 'multi_label_classification'} but no luck.

Any clue?

Hello @Pidem,

when you implement a custom inference.py you don’t have to pass any ENV.

I tried this as well but that didn’t work. How can I make sure that the inference.py gets picked up ? It is in the code/ folder in my case.

How have you created your inference.py / created the model.tar.gz?
Here is an example: Deploy FLAN-T5 XXL on Amazon SageMaker
maybe that helps

Yes, my inference.py file is in my model.tar.gz:


from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Helper: Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def model_fn(model_dir):
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModel.from_pretrained(model_dir)
    return model, tokenizer

def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences
    sentences = data.pop("inputs", data)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
    
    # return dictionary, which will be json serializable
    return [{"vectors": sentence_embeddings[i].tolist()} for i in range(len(sentence_embeddings))]

and my model.tar.gz folder structure is:

code/inference.py
pytorch_model.bin
tokenizer.json

my config.json file says: problem_type:"multi_label_classification"
and here is the error message I am seeing

Found the error. In case someone runs into the same error in the future: I had renamed the model.tar.gz to model_inference.tar.gz and apparently that creates an error :frowning_face:

1 Like

After breaking my head on this for one whole day. The issue is with how its suggested to generate the tar file.
When I created the tar file using below, I got the error.
!tar -zcvf model.tar.gz

However, generating tar using below worked like a charm

import tarfile
import os

# helper to create the model.tar.gz
def compress(tar_dir=None,output_file="model.tar.gz"):
    parent_dir=os.getcwd()
    os.chdir(tar_dir)
    with tarfile.open(os.path.join(parent_dir, output_file), "w:gz") as tar:
        for item in os.listdir('.'):
          print(item)
          tar.add(item, arcname=item)
    os.chdir(parent_dir)

compress(str(model_id))
3 Likes

When you created tarfile with “tar -zcvf model.tar.gz”, you just include the “model” directory in the target file. This would incur this error. You may try something like “tar -zcvf model.tar.gz -C model .” to exclude the “model/” directory itself but only compress files inside it.

1 Like

Yeah you’re right. I’ve tested that. It works now. Thank you for sharing.