How to deploy a T5 model to AWS SageMaker for fast inference?

pierreguillou · January 31, 2022, 9:00pm

Hi @philschmid. I’m with difficulty to understand your sentence. When I have a model (generative or not), the same text in input, and of course the same values for arguments of the generate() method (num_beams, etc.), I do not understand how the output (ie, the calculation by the model) could not be the same.

I just published a simple Colab notebook (generate_method_T5.ipynb) and run 1000 times the generate() method with the same input: the output is always the same (Pytorch and pipeline()).

Here, I understand. If you convert to ONNX format any model, you slightly change the values of the parameters of this model, and then this can create a different output than the corresponding Pytorch model (but again, always the same output from the same input).

As a proof of concept, in the same Colab notebook (generate_method_T5.ipynb), I used the library fastt5 in order to get an ONNX model from the T5 one.

From the question “When is the birthday of Pierre?”:

the Pytorch and pipeline() models give the answer “17 February”
and the ONNX model gives “30 years, 160 days”.

I’m fine with that (at least, I understand it).

Yes. I’'m using transformers 4.15 (I did test fastt5 with version > 4.16 but I had error when using the generate() method).

Last point: in AWS Sagemaker Inference, I did not use the ONNX model but the Pytorch T5 model. As I saw a different output, it means that the DLC for inference does some compression on T5 (ONNX or similar?) that could explain the different output?

Topic		Replies	Views
Deploying open llm - google/flan-t5-large model on AWS inferentia2 Amazon SageMaker	0	441	September 14, 2023
Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config' Amazon SageMaker	5	1465	November 7, 2022
Help for inference.py code Amazon SageMaker	10	4003	March 8, 2022
Inference Hyperparameters Amazon SageMaker	29	4838	October 8, 2021
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16200	April 12, 2024

How to deploy a T5 model to AWS SageMaker for fast inference?

Related topics