How to convert mT5 and ByT5 to ONNX format?

pierreguillou · November 9, 2021, 2:15pm

Hi,

ONNX allows to compress transformers models and speed up the inference time on CPU and GPU.

Who could share code / notebook to convert mT5 and ByT5 models to ONNX format?

There is the library fastT5 of @kira for T5 conversion (great!) but it has not been updated to the latest version of transformers and therefore, it does not accept mT5 and ByT5 models.

Thanks.

List of topicos about this subject:

List of online documents about this subject:

JulesBelveze · November 9, 2021, 3:23pm

Hi @pierreguillou I recently tried to use fastT5 with on a mT5 model and it worked great (regardless the warning), however you’ll need to have transformers=="4.6.1".

Are you getting any error ? Or are the outputs nonsense?

Note: I changed T5ForConditionalGeneration to MT5ForConditionalGeneration in onnx_exporter.py but I still think it was working without this.

pierreguillou · November 9, 2021, 4:12pm

Thanks @JulesBelveze. I guess you did the right choice with the change from T5ForConditionalGeneration to MT5ForConditionalGeneration in onnx_exporter.py of fastT5 (but it means that we need to manually change the official fastT5 library…).

In the case of ByT5, it is different as the ByT5Tokenizer is not part of transformers=="4.6.1".
How to convert a ByT5 model to the ONNX format?

JulesBelveze · November 9, 2021, 4:16pm

Argh yes for the ByT5 it sounds like you’ll need to have the fastT5 library updated, which unfortunately doesn’t seem trivial

sam-writer · December 22, 2021, 7:36pm

fastT5 now works with transformers>=4.12.5 (this is the latest version I have tested). It also has some tweaks to the quantization settings that should speed it up a bit more.

I can confirm it works with byt5 now, and it should work with mt5 as well (except for needing to change the ConditionalGeneration class as noted above). However, byt5 is substantially slower due to both the larger encoder and needing one decoding step per character

EDIT: I just opened a PR to explicitly support mt5 as well. I did some limited testing and it seems to work.

Topic		Replies	Views
Speeding up T5 inference :rocket: 🤗Transformers	17	13112	August 26, 2022
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5634	June 8, 2023
How do we quantize facebook / mbart-large-50-one-to-many-mmt to ONNX runtime 🤗Transformers	2	811	June 10, 2021
Improving decoding speed by onnx conversion model Beginners	0	243	November 17, 2021
Optimize large scale transformer model inference with ONNX Runtime Models	0	385	January 18, 2022

How to convert mT5 and ByT5 to ONNX format?

Related topics