mBART fine tuning performs worse

shraajan · November 22, 2024, 12:59am

Hi , I’m fine tuning mBART-50-many-to-many-mt on a language that is unseen in its pre training.

I did a lot of background research and found that many papers discuss that fine tuning NMT models on high quality unseen data works and it gives good results. (Bleu : 10)

When I’m trying to replicate the same. This doesn’t work at all (Bleu:0.1, 5 epochs) I don’t know what I’m doing wrong . I’ve basically followed hugging face’s documentation to write the code , which I verified was right after cross checking from a GitHub repo of someone who fine tuned the same model.

A little more context

1.  The dataset consists of En->Xx sentnce pairs

2. I used the auto tokenizer and used hugging face's trainer to train the model.

3. As for arguments, the important ones are LR:0.0005 , Epoch : 5 (runtime constraints) , batch :16 (memory constraints) , optim : adamW . Basically these. The loss improved from 3.3 to 0.8 after 5 epochs and Bleu 0.04 to 0.1 (don't know if this is improvement)

I even tried looking into majority reasons why this could happen but I’ve made sure to not overlook things. The dataset quality is high. Tokenizing is proper, arguments are proper . So I’m very lost as to why this is happening. If anyone can point out things i may have missed considering
. Can someone help me please.

Topic		Replies	Views
Mbart finetuning Models	0	676	July 29, 2021
How to finetune mT5 🤗Transformers	0	628	July 19, 2021
mBART finetuning tips/post-mortem 🤗Transformers	6	2643	November 17, 2020
Fine-tuning MT5 on XNLI Beginners	1	1777	October 16, 2021
Finetuned MT5 model generating the same first token for any input Intermediate	0	231	May 9, 2023

mBART fine tuning performs worse

Related topics