Hello @philschmid.
Thanks for your answer.
I’m not sure to understand. Clearly, a T5 model uses the .generate()
method with a beam search to create a translation. However, the default value of beam search is 1, which means no beam search as written in the HF doc of the .generate() method:
**num_beams** (
int , optional, defaults to 1) – Number of beams for beam search. 1 means no beam search.
Therefore, by default in AWS SageMaker, there is only one forward pass through the model T5 (base) at each inference when predictor.predict(data)
is launched, no? And if you confirm this point, it means that the distilbert model in AWS SageMaker DLC is optimized, and not the T5 model. What do you think?
Note: by the way, what would be the code in AWS SageMaker to increase the beam search argument in .generate()
?
Well, I have a question: when HF will optimize Seq2Seq models like T5 in AWS SageMaker DLC?
Let’s say it will be only next year. That means I need to do it myself today. Could you first validate the following steps?
- Finetune a T5 base to a downstream task either in AWS SageMaker, either in another environment (GCP, local GPU, etc.)
- Compress to ONNX format (with fastT5 for example) the finetuned T5 base model.
- Upload the ONNX T5 base model to S3 in AWS
- Use the ONNX T5 base model in AWS SageMaker DLC in order to make inferences
The last question is: where can I find the code for steps 3 and 4?
Thanks for your help.