Why does translation quality go down after fine-tuning only one epoch?

dave-kudo · September 23, 2021, 6:16pm

This might be better suited to #beginners, but it is definitely #transformers-specific… I’m new to and wanted to try model adaptation of one of the Helsinki-NLP MT models using fine-tuning.

I’ve created a DatasetDict with train, dev and test, and managed to load the pre-trained model and run a trainer for one epoch. My corpus is very small (<1k segments), so my expectation is that it would have little impact on the baseline model.

However, when I use the locally-saved config to translate, the results look more like the output of a model just starting its training. Is this expected, or am I doing something terribly wrong?

Thanks!

rgwatwormhill · September 23, 2021, 10:19pm

hi dave-kudo,

[I am not an expert]

I think that would be expected. If you ran fine-tuning with your new corpus as well as the original training corpus, then your small corpus would have little impact. However, now that you are training only with your small corpus, the model “thinks” that the new corpus is the whole of its training data. It doesn’t “know” that it is already very well trained (to the original data). It puts all its effort into optimising for the new corpus, including forgetting a lot of earlier stuff.

Have you tried using a smaller learning-rate?

Another possibility is to freeze several of the early layers of the model, so that your fine-tuning is only affecting the last few layers.

dave-kudo · September 24, 2021, 9:47am

Ah, of course! I was missing something obvious… forgot to freeze the early layers. Thanks for your insight!

dave-kudo · September 24, 2021, 5:36pm

Actually, the real problem is even more embarrassing… Even without freezing layers, the output shouldn’t have looked the way it did. It turns out that I was using .from_config() instead of .from_pretrained() in trying to load my fine-tuned model. Once I fixed that, the results were much more in line with my expectations.

Topic		Replies	Views
Fine-tuning seq2seq: Helsinki-NLP 🤗Transformers	4	2265	December 8, 2020
Freezing mt5 model for fine-tuning Models	1	479	July 15, 2023
Fine-tune a translation or train from sratch? Beginners	6	1366	April 15, 2022
Finetuning neox 20b, why is resulting model so small Beginners	1	294	September 19, 2022
Anyone have idea how we can finetune a model using Trainer API? 🤗Transformers	0	446	April 22, 2022

Why does translation quality go down after fine-tuning only one epoch?

Related topics