How do I finetune bart for machine translation

I was trying to finetune the “facebook/bart-base” model on a Telugu language dataset. I’ve trained my custom tokenizer and used it. But still the results are not even close to being correct. Here’s the space:

What do I do? Does the bart model only works fine for European, Slavic languages?
this is the script:

Please someone help me out!
Thank You

The documentation says that the BART model works for English. facebook/bart-base I think fine-tuning is not enough and you would have to pre-train the model for Telugu.

1 Like

Yes it’s not possible to train a custom tokenizer and then use it with a model which already has a certain vocabulary defined. You would need to also train the model from scratch in case you trained a new tokenizer.

BART itself is an English model indeed, so I’d recommend to start from a different pre-trained model to fine-tune on Telugu, such as M2M100 or NLLB.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.