How to deploy a T5 model to AWS SageMaker for fast inference?

pierreguillou · February 3, 2022, 12:12pm

Thanks @philschmid for this information about T5 in Sagemaker Inference (no compression until today).

I used the translation script (I used the script locally in an AWS Sagemaker notebook instance as I did some changes in the script). It has a requirements.txt (see my modified content) but this file did not install `transformers==4.15:

# content of my modified requirements.txt file
accelerate
datasets >= 1.16.0
sentencepiece != 0.1.92
protobuf
sacrebleu >= 1.4.12
py7zr
torch >= 1.3
jiwer

Then, I did train my T5 model on AWS Sagemaker Training DLC with libraries versions from Reference >> Training DLC Overview. As showed in the following screenshot and code from my notebook, I used transformers==4.12.3 and Pytorch 1.9.1:

print(sagemaker.__version__)
# 2.72.1

huggingface_estimator = HuggingFace(
      base_job_name=base_job_name,
      checkpoint_s3_uri=checkpoint_s3_bucket,
      checkpoint_local_path=checkpoint_local_path,
      entry_point='run_translation.py',
      source_dir='./translation',
      instance_type='ml.p3.2xlarge',
      instance_count=1,
      transformers_version='4.12.3',
      pytorch_version='1.9.1',
      py_version='py38',
      hyperparameters = hyperparameters,
      (...)
)

Then, I uploaded my T5 model to HF model hub in private mode.

Finally, I did use AWS Sagemaker Inference with the same libraries versions in the following code:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

hub = {
  'HF_MODEL_ID':'xxxxxxx', # model_id from hf.co/models
  'HF_TASK':'text2text-generation', 
  'HF_API_TOKEN':"xxxxxxxx" # my API token
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.12.3", # transformers version used
   pytorch_version="1.9.1", # pytorch version used
   py_version="py38", # python version of the DLC
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge" 
)

input_text = "xxxx"

data= {
    "inputs":input_text,
    "parameters": {
        "max_length":32, # same value than the one used for training
        "num_beams":1, # same value than the one used for training
        "early_stopping":True # same value than the one used for training
    }
}

# request
predictor.predict(data)

However, as said in my post, the predictions from predictor.predict(data) is different than the ones I get in a Colab notebook with the same Pytorch model and same arguments (num_beams,…).

what do you think? Thank you for your help.

Topic		Replies	Views
How to speed up Blenderbot inference with Sagemaker? 🤗Transformers	0	412	February 7, 2023
Slow inference using most recent docker image Amazon SageMaker	10	3194	March 21, 2022
Deploying Sentence Transformer as sagemaker endpoint Amazon SageMaker	18	8146	March 26, 2024
SageMaker Inference for Model Tuned Elsewhere Amazon SageMaker	4	1067	September 2, 2021
Deploying Mixtral8x7B on AWS Sagemaker from S3 Amazon SageMaker	2	479	June 11, 2024

How to deploy a T5 model to AWS SageMaker for fast inference?

Related topics