How do I finetune bart for machine translation

heisenberg3376 · June 9, 2024, 7:10am

I was trying to finetune the “facebook/bart-base” model on a Telugu language dataset. I’ve trained my custom tokenizer and used it. But still the results are not even close to being correct. Here’s the space:

What do I do? Does the bart model only works fine for European, Slavic languages?
this is the script:

Please someone help me out!
Thank You

MattiLinnanvuori · June 9, 2024, 3:19pm

The documentation says that the BART model works for English. facebook/bart-base I think fine-tuning is not enough and you would have to pre-train the model for Telugu.

nielsr · June 9, 2024, 7:29pm

Yes it’s not possible to train a custom tokenizer and then use it with a model which already has a certain vocabulary defined. You would need to also train the model from scratch in case you trained a new tokenizer.

BART itself is an English model indeed, so I’d recommend to start from a different pre-trained model to fine-tune on Telugu, such as M2M100 or NLLB.

system · June 10, 2024, 7:29am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Beginner] fine-tune Bart with custom dataset in other language? Beginners	2	3232	January 22, 2021
Fine tune bart model with my dataset Beginners	0	662	July 16, 2021
Is there any example of training BART for text-to-text generation? Beginners	0	533	March 2, 2023
Fine-tuning BERT for code translation Beginners	0	770	July 7, 2023
BART Fine-Tuning Resources/Help Beginners	0	331	March 7, 2023

How do I finetune bart for machine translation

Related topics