How to deploy a T5 model to AWS SageMaker for fast inference?

Hi @philschmid.

I’m back to you about using AWS SageMaker for inference with a Text2Text-Generation model like T5.

My objective is to use an ONNX T5 model for inference but in order to understand the logic behind the SageMaker Hugging Face Inference Toolkit, I started with a T5 model from the HF hub.

I’m using for doing that your notebook deploy_transformer_model_from_hf_hub.ipynb.

It worked but I was surprised to get a different predicted text than the one I get when I use the model in a notebook.

As I understood that the deploy HF code in AWS SageMaker uses pipeline(), my hypothesis is that arguments like num_beams, max_length have default values that I need to change.

Then, my question is: how to change the values of theses arguments in a deploy from AWS SageMaker? Thanks.