How to Improve inference time of facebook/mbart many to many model?

valhalla · March 15, 2021, 3:47pm

One more option to improve the speed is to use onnx_runtime, but at this moment we don’t have any tool/script which will let you import export MBart to onnx.

I’ve written a script for exporting T5 to onnx, something similar can be used for MBart as well

Another option is, we have also ported the M2M100 model, which is SOTA many-to-many translation model. The m2m100_418M smaller than MBart50, can give more speed-up. Here’s an example of how to use M2M100, M2M100

Topic		Replies	Views
How do we quantize facebook / mbart-large-50-one-to-many-mmt to ONNX runtime 🤗Transformers	2	805	June 10, 2021
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5613	June 8, 2023
After quantization of facebook/Mbart50 gives empty output Models	0	334	October 6, 2022
MBART50 .generate() is very slow Beginners	0	663	July 21, 2021
Increase the speed of the Mbart model Beginners	1	648	September 28, 2023

How to Improve inference time of facebook/mbart many to many model?

Related topics