How to deploy a T5 model to AWS SageMaker for fast inference?

Hey @pierreguillou,

Thanks for opening the thread and I am happy to hear the workshop material was enough to get you started!

So currently the models aren’t optimized automatically. So when you would like to run optimized models you would need to optimize them currently by yourself and then provide them.

Regarding your speed assumption.

There are 4 times more parameters in T5 base than distilbert-base-uncased, but its inference time is 10 times slower on the same instance (type: ml.m5.xlarge ) of AWS SageMaker.

That’s because both models have different architecture and trained on different tasks and methods for inference. For example, T5 uses the .generate method with a beam search to create your translation, which means it is not running 1 forward pass through the model there can be multiple.
So the latency difference between distilbert and T5 makes sense and is not related to SageMaker.

2 Likes