How to Improve inference time of facebook/mbart many to many model?

Vimal0703 · March 2, 2021, 4:27am

If we tried to run translation service on facebook mbart many to many on cpu it take 9 secs to translate, how do we reduce the inference time further…

lewtun · March 2, 2021, 8:04am

Hi @Vimal0703, one idea could be to try quantizing the model’s weights to a lower precision datatype. See e.g. step 2 in this guide: Dynamic Quantization — PyTorch Tutorials 1.7.1 documentation

This usually gives you a 2-3x reduction in latency and model size

Vimal0703 · March 15, 2021, 12:02pm

Thank you it worked out, now we are able to quantize the model where the time takes around 1-2 seconds, but is there any way to decrease the time further…

valhalla · March 15, 2021, 3:47pm

Hi @Vimal0703

One more option to improve the speed is to use onnx_runtime, but at this moment we don’t have any tool/script which will let you import export MBart to onnx.

I’ve written a script for exporting T5 to onnx, something similar can be used for MBart as well

Another option is, we have also ported the M2M100 model, which is SOTA many-to-many translation model. The m2m100_418M smaller than MBart50, can give more speed-up. Here’s an example of how to use M2M100, M2M100

addressoic · June 10, 2021, 4:59pm

Are you able to share this script or settings for quantizing this model? I would greatly apreciate it as I’m also going through a similar undertaking!

@valhalla Not to be greedy but was any one able to provide a script for this? Otherise I would like to work on this.

Marsius · October 4, 2022, 1:31pm

@Vimal0703 @addressoic can you please share the script, I have similar task…

Topic		Replies	Views
Increase the speed of the Mbart model Beginners	1	646	September 28, 2023
MBART50 .generate() is very slow Beginners	0	660	July 21, 2021
Slow inference while performing translation Intermediate	0	604	June 10, 2022
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5603	June 8, 2023
Translation takes too long - from fine-tuned mbart-large-50 model Beginners	0	407	September 7, 2021

How to Improve inference time of facebook/mbart many to many model?

Related topics