I am having an issue with the new MBart50 - I was wondering if you could help me figure out what I am doing wrong.
I am trying to copy code from here – specifically, I tweaked it to translate a sentence from French into Persian.
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast article_fr = "Paris est toujours une bonne idee" model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt") tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt") # translate Hindi to French tokenizer.src_lang = "fr_XX" encoded_hi = tokenizer(article_fr, return_tensors="pt") generated_tokens = model.generate( **encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["fa_IR"] ) tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
but it then outputs
['Paris is always a good idea']
(which is obviously in English – not in Persian)
How can I get it to output in Persian? I tried using the