Hello, I want to use Mbart for translation tasks, but the translation is too slow, in fact it takes 1min and more for 2000 characters when I would need a translation in a few seconds. Is there any way to increase the speed of this model? threading, sharding etc…
my current code:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast import time article = "Hello world." model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt") tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt") time1 = time.time() tokenizer.src_lang = "en_XX" encoded_hi = tokenizer(article, return_tensors="pt") generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["fr_XX"]) print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)) print(time.time()-time1)
for 2 words, the translation takes 2.5 seconds. I can’t use a gpu.