seq2seq decoding is inherently slow and using onnx is one obvious solution to speed it up. The onnxt5 package already provides one way to use onnx for t5. But if we export the complete T5 model to onnx, then we can’t use the past_key_values for decoding since for the first decoding step past_key_…

Speeding up T5 inference 🚀

kira March 15, 2021, 9:29am 14

thank you! @valhalla. created a new thread here.

Topic		Replies	Views
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5611	June 8, 2023
How to convert mT5 and ByT5 to ONNX format? 🤗Transformers	4	2065	December 22, 2021
Optimum & T5 for inference 🤗Optimum	18	5812	February 8, 2023
How can I use the ONNX model? 🤗Transformers	2	1598	January 29, 2024
Improving decoding speed by onnx conversion model Beginners	0	241	November 17, 2021