How to deploy a T5 model to AWS SageMaker for fast inference?

pierreguillou · November 18, 2021, 2:01pm

Thanks for your answer.

I’m not sure to understand. Clearly, a T5 model uses the .generate() method with a beam search to create a translation. However, the default value of beam search is 1, which means no beam search as written in the HF doc of the .generate() method:

**num_beams** ( int , optional, defaults to 1) – Number of beams for beam search. 1 means no beam search.

Therefore, by default in AWS SageMaker, there is only one forward pass through the model T5 (base) at each inference when predictor.predict(data) is launched, no? And if you confirm this point, it means that the distilbert model in AWS SageMaker DLC is optimized, and not the T5 model. What do you think?

Note: by the way, what would be the code in AWS SageMaker to increase the beam search argument in .generate()?

Well, I have a question: when HF will optimize Seq2Seq models like T5 in AWS SageMaker DLC?

Let’s say it will be only next year. That means I need to do it myself today. Could you first validate the following steps?

Finetune a T5 base to a downstream task either in AWS SageMaker, either in another environment (GCP, local GPU, etc.)
Compress to ONNX format (with fastT5 for example) the finetuned T5 base model.
Upload the ONNX T5 base model to S3 in AWS
Use the ONNX T5 base model in AWS SageMaker DLC in order to make inferences

The last question is: where can I find the code for steps 3 and 4?
Thanks for your help.

Topic		Replies	Views
Deploying open llm - google/flan-t5-large model on AWS inferentia2 Amazon SageMaker	0	441	September 14, 2023
Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config' Amazon SageMaker	5	1465	November 7, 2022
Help for inference.py code Amazon SageMaker	10	4003	March 8, 2022
Inference Hyperparameters Amazon SageMaker	29	4839	October 8, 2021
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16201	April 12, 2024

How to deploy a T5 model to AWS SageMaker for fast inference?

Related topics