Convert ASR to ONNX


It’s really nice that we can access Speech technology really easily thanks to the new models (like facebook/wav2vec2-base-960h).

However there is no included way to convert the model to ONNX.
No problem, we can juste take : This nice pytorch tutorial which works like a charm !

However, when using with let’s say NodeJs there is no way to have the tokenizer back because only the python library defines the Wav2Vec2Tokenizer…

How could we use it in other stacks ?

I went to check the source code of the Wav2Vec2Tokenizer, and it appears to do only a padding ? So I guess it would be doable in a small amount of time to replicate this behaviour in another language ? Like get raw array, pad it, convert it to onnx and run in the onnx model and then decode with some logic and vocab file.

Thanks in advance and have a great day.