How to deploy a T5 model to AWS SageMaker for fast inference?

pierreguillou · November 17, 2021, 11:27pm

You are right about GPU vs CPU inference time but I’m doing tests with the same configuration for the 2 models (distilbert-base-uncased and T5 base).

About models size, we are not talking here of large DL models.

distilbert-base-uncased: 66 millions parameters (fonte) / Inference time: 70ms
T5 base: 220 million parameters (fonte) / Inference time: 700ms

There are 4 times more parameters in T5 base than distilbert-base-uncased, but its inference time is 10 times slower on the same instance (type: ml.m5.xlarge) of AWS SageMaker.

Clearly, I can use a better instance and it will improve the 2 inference times but without explaining the reasons of the low inference time for a Seq2Seq model as T5 base in AWS SageMaker.

I think that the T5 base is not optimized as the BERT models are in AWS SageMaker (through ONNX for example) but only the HF team can confirm or not I guess.

Topic		Replies	Views
Deploying open llm - google/flan-t5-large model on AWS inferentia2 Amazon SageMaker	0	441	September 14, 2023
Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config' Amazon SageMaker	5	1465	November 7, 2022
Help for inference.py code Amazon SageMaker	10	4003	March 8, 2022
Inference Hyperparameters Amazon SageMaker	29	4838	October 8, 2021
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16200	April 12, 2024

How to deploy a T5 model to AWS SageMaker for fast inference?

Related topics