Increase the speed of the Mbart model

Hello, I want to use Mbart for translation tasks, but the translation is too slow, in fact it takes 1min and more for 2000 characters when I would need a translation in a few seconds. Is there any way to increase the speed of this model? threading, sharding etc…

my current code:

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
import time

article = "Hello world."

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

time1 = time.time()

tokenizer.src_lang = "en_XX"
encoded_hi = tokenizer(article, return_tensors="pt")
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["fr_XX"])

print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True))
print(time.time()-time1)

Capture d’écran du 2021-12-10 17-57-24
for 2 words, the translation takes 2.5 seconds. I can’t use a gpu.

Hi, I’m struggling with the same problem, were you able to find a solution?