How to deploy a T5 model to AWS SageMaker for fast inference?

pierreguillou · January 24, 2022, 2:48pm

I’m back to you about using AWS SageMaker for inference with a Text2Text-Generation model like T5.

My objective is to use an ONNX T5 model for inference but in order to understand the logic behind the SageMaker Hugging Face Inference Toolkit, I started with a T5 model from the HF hub.

I’m using for doing that your notebook deploy_transformer_model_from_hf_hub.ipynb.

It worked but I was surprised to get a different predicted text than the one I get when I use the model in a notebook.

As I understood that the deploy HF code in AWS SageMaker uses pipeline(), my hypothesis is that arguments like num_beams, max_length have default values that I need to change.

Then, my question is: how to change the values of theses arguments in a deploy from AWS SageMaker? Thanks.

Topic		Replies	Views
Deploying open llm - google/flan-t5-large model on AWS inferentia2 Amazon SageMaker	0	441	September 14, 2023
Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config' Amazon SageMaker	5	1465	November 7, 2022
Help for inference.py code Amazon SageMaker	10	4001	March 8, 2022
Inference Hyperparameters Amazon SageMaker	29	4836	October 8, 2021
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16196	April 12, 2024

How to deploy a T5 model to AWS SageMaker for fast inference?

Related topics