So, I needed to have an English Translated version of some texts. For this, first, I used google trans, and this worked fine. It was able to complete about 7-8 translations per second. Then, I tried using Facebook 600M distilled NLLB. But, this took about 10 seconds for a translation. Now, I ran both the code on Google Colab. Is this the expected thing? Or is something wrong?
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
I’m seeing issues with how slow this is as well. Is this expected for a 600M parameter model? I get much faster responses from other autoregressive models of similar weights. Can someone from HF chime in?
Why am I performing the operation ‘ct2 transformers converter - model nllb-200-distilled-600M - output dir nllb-200-distilled-600M-CTransllate2’
Reported error: AttributeError: ‘M2M100Encoder’ object has no attribute ‘embed_stale’