Hi,
I am having an issue with the new MBart50 - I was wondering if you could help me figure out what I am doing wrong.
I am trying to copy code from here – specifically, I tweaked it to translate a sentence from French into Persian.
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
article_fr = "Paris est toujours une bonne idee"
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
# translate Hindi to French
tokenizer.src_lang = "fr_XX"
encoded_hi = tokenizer(article_fr, return_tensors="pt")
generated_tokens = model.generate(
**encoded_hi,
forced_bos_token_id=tokenizer.lang_code_to_id["fa_IR"]
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
but it then outputs
['Paris is always a good idea']
(which is obviously in English – not in Persian)
How can I get it to output in Persian? I tried using the "fa_IR"
lang_code_to_id.
Thanks