MBART50 .generate() is very slow

yoshi · July 21, 2021, 2:20pm

Hello,

I am currently working on the MBART50 many-to-one model for translation. The model takes a really long time to generate the translation. Is this normal? How can we optimize it?

I tried in CPU and GPU but both remain slow :

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-one-mmt")

Inference time in seconds for model.generate(**input, max_length=max_length) where input is a tokenized string with 1024 tokens :

max_length	8 CPUs	1 GPU
200	~38s	~4s
512	~105s	~11s
750	~160s	~16s
1024	~237s	~22s

It takes this long just for one string… Doing it in batch does not make it faster . Any idea what’s wrong or how to optimize?

Thank you !

Topic		Replies	Views
Increase the speed of the Mbart model Beginners	1	648	September 28, 2023
How to Improve inference time of facebook/mbart many to many model? 🤗Transformers	5	1886	October 4, 2022
Slow inference while performing translation Intermediate	0	604	June 10, 2022
Translation takes too long - from fine-tuned mbart-large-50 model Beginners	0	407	September 7, 2021
MBart50Tokenizer vs XLMRobertaTokenizer 🤗Tokenizers	0	484	July 19, 2021

MBART50 .generate() is very slow

Related topics