How to convert mT5 and ByT5 to ONNX format?

Hi,

ONNX allows to compress transformers models and speed up the inference time on CPU and GPU.

Who could share code / notebook to convert mT5 and ByT5 models to ONNX format?

There is the library fastT5 of @kira for T5 conversion (great!) but it has not been updated to the latest version of transformers and therefore, it does not accept mT5 and ByT5 models.

Thanks.

List of topicos about this subject:

List of online documents about this subject:

2 Likes

Hi @pierreguillou I recently tried to use fastT5 with on a mT5 model and it worked great (regardless the warning), however you’ll need to have transformers=="4.6.1".

Are you getting any error ? Or are the outputs nonsense?

Note: I changed T5ForConditionalGeneration to MT5ForConditionalGeneration in onnx_exporter.py but I still think it was working without this.

Thanks @JulesBelveze. I guess you did the right choice with the change from T5ForConditionalGeneration to MT5ForConditionalGeneration in onnx_exporter.py of fastT5 (but it means that we need to manually change the official fastT5 library…).

In the case of ByT5, it is different as the ByT5Tokenizer is not part of transformers=="4.6.1".
How to convert a ByT5 model to the ONNX format?

Argh yes for the ByT5 it sounds like you’ll need to have the fastT5 library updated, which unfortunately doesn’t seem trivial :confused:

fastT5 now works with transformers>=4.12.5 (this is the latest version I have tested). It also has some tweaks to the quantization settings that should speed it up a bit more.

I can confirm it works with byt5 now, and it should work with mt5 as well (except for needing to change the ConditionalGeneration class as noted above). However, byt5 is substantially slower due to both the larger encoder and needing one decoding step per character

EDIT: I just opened a PR to explicitly support mt5 as well. I did some limited testing and it seems to work.

1 Like