How to Improve inference time of facebook/mbart many to many model?

Hi @Vimal0703

One more option to improve the speed is to use onnx_runtime, but at this moment we don’t have any tool/script which will let you import export MBart to onnx.

I’ve written a script for exporting T5 to onnx, something similar can be used for MBart as well

Another option is, we have also ported the M2M100 model, which is SOTA many-to-many translation model. The m2m100_418M smaller than MBart50, can give more speed-up. Here’s an example of how to use M2M100, M2M100

1 Like