Export pretrained MT5 model to ONNX

Hi, I saw that someone was able to convert the speecht5-tts model to onnx format.

In the discussion they claimed they used optimum but the unstable version at the time. However, they didn鈥檛 elaborate more on that ad when I looked though the code I couldn鈥檛 find a pipeline that would support it anywhere.

Does anyone know how this could be done?

Hi @JamesXanda, you can export it with the CLI as follows:

optimum-cli export onnx --model microsoft/speecht5_tts test_onnx --model-kwargs '{"vocoder": "microsoft/speecht5_hifigan"}'

Let me know if that works on your side.
Note that you can use the latest stable release of Optimum for exporting this model, no need to install it from source.

Ah yes, this worked. Are there any ORT pipelines for running it in python?

For doing text to speech no. However, it should probably be similar to the speech-to-text pipeline: optimum/optimum/onnxruntime/modeling_seq2seq.py at e3fd2776a318a3a7b9d33315cc42c04c181f6d2f 路 huggingface/optimum 路 GitHub

Interesting, I would have though it would be pretty similar to the Transformer.js text-to-speech pipeline since that loads from the optimum exported ONNX weights.

Anyway, thank you for your help.

Here is an example snippet to make it work in Python: SpeechT5 ONNX support by fxmarty 路 Pull Request #1404 路 huggingface/optimum 路 GitHub
It can probably be improved though.