How to deploy a T5 model to AWS SageMaker for fast inference?

philschmid · November 18, 2021, 7:41am

Thanks for opening the thread and I am happy to hear the workshop material was enough to get you started!

So currently the models aren’t optimized automatically. So when you would like to run optimized models you would need to optimize them currently by yourself and then provide them.

Regarding your speed assumption.

There are 4 times more parameters in T5 base than distilbert-base-uncased, but its inference time is 10 times slower on the same instance (type: ml.m5.xlarge ) of AWS SageMaker.

That’s because both models have different architecture and trained on different tasks and methods for inference. For example, T5 uses the .generate method with a beam search to create your translation, which means it is not running 1 forward pass through the model there can be multiple.
So the latency difference between distilbert and T5 makes sense and is not related to SageMaker.

Topic		Replies	Views
How to speed up Blenderbot inference with Sagemaker? 🤗Transformers	0	412	February 7, 2023
Slow inference using most recent docker image Amazon SageMaker	10	3196	March 21, 2022
Deploying Sentence Transformer as sagemaker endpoint Amazon SageMaker	18	8167	March 26, 2024
SageMaker Inference for Model Tuned Elsewhere Amazon SageMaker	4	1068	September 2, 2021
Deploying Mixtral8x7B on AWS Sagemaker from S3 Amazon SageMaker	2	481	June 11, 2024

How to deploy a T5 model to AWS SageMaker for fast inference?

Related topics