mBART fine tuning performs worse

Hi , I’m fine tuning mBART-50-many-to-many-mt on a language that is unseen in its pre training.

I did a lot of background research and found that many papers discuss that fine tuning NMT models on high quality unseen data works and it gives good results. (Bleu : 10)

When I’m trying to replicate the same. This doesn’t work at all (Bleu:0.1, 5 epochs) I don’t know what I’m doing wrong . I’ve basically followed hugging face’s documentation to write the code , which I verified was right after cross checking from a GitHub repo of someone who fine tuned the same model.

A little more context

1.  The dataset consists of En->Xx sentnce pairs

2. I used the auto tokenizer and used hugging face's trainer to train the model.

3. As for arguments, the important ones are LR:0.0005 , Epoch : 5 (runtime constraints) , batch :16 (memory constraints) , optim : adamW . Basically these. The loss improved from 3.3 to 0.8 after 5 epochs and Bleu 0.04 to 0.1 (don't know if this is improvement)

I even tried looking into majority reasons why this could happen but I’ve made sure to not overlook things. The dataset quality is high. Tokenizing is proper, arguments are proper . So I’m very lost as to why this is happening. If anyone can point out things i may have missed considering
. Can someone help me please.

1 Like