Is Facebook NLLB too slow?

selfisheagle · March 19, 2023, 12:49pm

So, I needed to have an English Translated version of some texts. For this, first, I used google trans, and this worked fine. It was able to complete about 7-8 translations per second. Then, I tried using Facebook 600M distilled NLLB. But, this took about 10 seconds for a translation. Now, I ran both the code on Google Colab. Is this the expected thing? Or is something wrong?

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")

translator_nllb = pipeline('translation', model=model, tokenizer=tokenizer, src_lang="ory_Orya", tgt_lang='eng_Latn', max_length = 400)

translated_text_nllb = translator_nllb(text)[0]['translation_text']

bennicholl · August 18, 2023, 5:13pm

I’m seeing issues with how slow this is as well. Is this expected for a 600M parameter model? I get much faster responses from other autoregressive models of similar weights. Can someone from HF chime in?

Kushtrim · January 13, 2024, 11:21am

I have the same issue. I wonder if this could be solved somehow.

nielsr · January 13, 2024, 2:36pm

Hi,

You’re not using a GPU, which will indeed make it pretty slow. Try adding the device argument when instantiating the pipeline:

from transformers import pipeline
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = pipeline(model="facebook/nllb-200-distilled-600M", src_lang="eng_Latn", tgt_lang='nld_Latn', max_length = 400, device=device)

translated_text = pipe("hello world")[0]['translation_text']
print(translated_text)

Kushtrim · January 14, 2024, 2:49pm

In my case, I do have a GPU and pass it just like in your example, but it’s still too slow.

nielsr · January 14, 2024, 5:37pm

Then I recommend to use CTranslate2, which is a C++ port of these models. Inference with NLLB is shown here.

ewwink · April 9, 2024, 11:28am

thanks, it 5 times faster.
dont forget to set the device

translator = ctranslate2.Translator("nllb-200-distilled-600M", device='cuda')
tokenizer = AutoTokenizer.from_pretrained('facebook/nllb-200-distilled-600M', src_lang=src_lang, device='cuda')

asiraja · August 25, 2024, 5:08pm

ser, have any recommend for my specs to run any model.
So like this my specs

AMD EPYC 7282
core : 4 vCPU
Memory space: 100GB
Memory RAM : 6GB

what is the best run any model?

LiPengtao12138 · August 30, 2024, 8:18am

Why am I performing the operation ‘ct2 transformers converter - model nllb-200-distilled-600M - output dir nllb-200-distilled-600M-CTransllate2’
Reported error: AttributeError: ‘M2M100Encoder’ object has no attribute ‘embed_stale’

Topic		Replies	Views
How to perform fast batch inference for NLLB Model translation? Models	4	3886	September 3, 2024
Language Model Skips entire Sentence 🤗Transformers	0	217	September 8, 2023
Slow inference for translation Beginners	0	181	April 22, 2024
Boosting the speed of a translation model Helsinki-NLP/opus-mt-en-ar 🤗Transformers	0	735	October 2, 2023
Increase the speed of the Mbart model Beginners	1	646	September 28, 2023

Is Facebook NLLB too slow?

Related topics