ONNX allows to compress transformers models and speed up the inference time on CPU and GPU.
Who could share code / notebook to convert mT5 and ByT5 models to ONNX format?
There is the library fastT5 of @kira for T5 conversion (great!) but it has not been updated to the latest version of transformers and therefore, it does not accept mT5 and ByT5 models.
List of topicos about this subject:
List of online documents about this subject:
Hi @pierreguillou I recently tried to use fastT5 with on a mT5 model and it worked great (regardless the warning), however you’ll need to have
Are you getting any error ? Or are the outputs nonsense?
Note: I changed
onnx_exporter.py but I still think it was working without this.
Thanks @JulesBelveze. I guess you did the right choice with the change from
onnx_exporter.py of fastT5 (but it means that we need to manually change the official fastT5 library…).
In the case of
ByT5, it is different as the ByT5Tokenizer is not part of
How to convert a ByT5 model to the ONNX format?
Argh yes for the
ByT5 it sounds like you’ll need to have the fastT5 library updated, which unfortunately doesn’t seem trivial
fastT5 now works with transformers>=4.12.5 (this is the latest version I have tested). It also has some tweaks to the quantization settings that should speed it up a bit more.
I can confirm it works with byt5 now, and it should work with mt5 as well (except for needing to change the ConditionalGeneration class as noted above). However, byt5 is substantially slower due to both the larger encoder and needing one decoding step per character
EDIT: I just opened a PR to explicitly support mt5 as well. I did some limited testing and it seems to work.