Why does translation quality go down after fine-tuning only one epoch?

This might be better suited to #beginners, but it is definitely #transformers-specific… I’m new to :hugs: and wanted to try model adaptation of one of the Helsinki-NLP MT models using fine-tuning.

I’ve created a DatasetDict with train, dev and test, and managed to load the pre-trained model and run a trainer for one epoch. My corpus is very small (<1k segments), so my expectation is that it would have little impact on the baseline model.

However, when I use the locally-saved config to translate, the results look more like the output of a model just starting its training. Is this expected, or am I doing something terribly wrong?

Thanks!

hi dave-kudo,

[I am not an expert]

I think that would be expected. If you ran fine-tuning with your new corpus as well as the original training corpus, then your small corpus would have little impact. However, now that you are training only with your small corpus, the model “thinks” that the new corpus is the whole of its training data. It doesn’t “know” that it is already very well trained (to the original data). It puts all its effort into optimising for the new corpus, including forgetting a lot of earlier stuff.

Have you tried using a smaller learning-rate?

Another possibility is to freeze several of the early layers of the model, so that your fine-tuning is only affecting the last few layers.

1 Like

Ah, of course! I was missing something obvious… forgot to freeze the early layers. :man_facepalming: Thanks for your insight!

Actually, the real problem is even more embarrassing… Even without freezing layers, the output shouldn’t have looked the way it did. It turns out that I was using .from_config() instead of .from_pretrained() in trying to load my fine-tuned model. Once I fixed that, the results were much more in line with my expectations.