Issue with MBart50 translation

AlanFeder · February 24, 2021, 5:51pm

Hi,

I am having an issue with the new MBart50 - I was wondering if you could help me figure out what I am doing wrong.

I am trying to copy code from here – specifically, I tweaked it to translate a sentence from French into Persian.

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

article_fr = "Paris est toujours une bonne idee"

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

# translate Hindi to French
tokenizer.src_lang = "fr_XX"
encoded_hi = tokenizer(article_fr, return_tensors="pt")
generated_tokens = model.generate(
    **encoded_hi,
    forced_bos_token_id=tokenizer.lang_code_to_id["fa_IR"]
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

but it then outputs

['Paris is always a good idea']

(which is obviously in English – not in Persian)

How can I get it to output in Persian? I tried using the "fa_IR" lang_code_to_id.

Thanks

neverNull · February 24, 2021, 7:23pm

I have the following returned. However, longer strings yield FR results, not 100% sure why. I assume it is a lack of training sentence pairs. Good luck!

سلام

Returned from this snippet.

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

article_fr = "Bonjour"

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang="fr_XX")

model_inputs = tokenizer(article_fr, return_tensors="pt")

generated_tokens = model.generate(
    **model_inputs,
    forced_bos_token_id=tokenizer.lang_code_to_id["fa_IR"]
)
print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True))

AlanFeder · February 24, 2021, 7:44pm

Yes – thanks! I get the same results for “Bonjour” as well.

Topic		Replies	Views
Weird behavior with mBART-50 and Spanish Models	0	301	July 30, 2021
MBART-50 looks not compatible with pipeline 🤗Transformers	0	68	July 10, 2024
Facebook mbart multilingual translation Beginners	0	499	February 1, 2023
Increase the speed of the Mbart model Beginners	1	646	September 28, 2023
Question about Multilingual Tokenizers expected behaviours Beginners	0	326	July 13, 2022

Issue with MBart50 translation

Related topics