Why transformer overfit quickly? how to solve it?

seyeeet · November 1, 2020, 5:55pm

Hi

I have a general question and appreciate your feedback on it.
I am new to transformers. My main problem is that it overfits so quickly, I am using regularization methods such as augmentation and dropout, but after 2 epochs my validation accuracy starts to drop while the training accuracy reaches to highest (basically my model overfit).
do you have any suggestions?
Interestingly I never see this behavior when I use convolutions…

s4sarath · November 2, 2020, 8:09am

My personal thought is if your data is less, it will overfit quickly. If you want to avoid it, reduce epochs. But, best way is to gather more data.

Having said that neither transformer nor neural networks suffer too much from overfitting. Some papers are there I guess. It’s good in generalizing most of the times. Remember transformer like models have quite good number of parameters, that’s also one reason of overfitting. But in downstream tasks, even if it overfits, it’s useful right.

Pre training is like a person who graduate with Masters. Fine tuning is like doing PhD ( except here it is quick , use your graduation skills to be an expert in specific field. So overfitting is okay. Personal opinion only.

seyeeet · November 2, 2020, 4:35pm

thanks for your reply. I have the same data in both cases.
but your answer helps actually. I think transformer just learn faster

shi-feng · March 4, 2021, 5:34pm

Does data augmentation help?
Thanks, I met the same problem recently, the model is reaching almost 99% training accuracy but testing is always staying around 79%.

Best

kaankork · August 9, 2021, 7:36pm

Hint: If you identify overfitting, use your validation set to tune your model hyper parameters. Once that’s done, use your unseen test set to do your final testing.

pacoelflaco · November 25, 2022, 8:36pm

Hey there, sorry for bumping this thread. I found your reply interesting, so I have to ask: how would you use the validation set to tune the models hyperparameters?

tk2801 · February 28, 2023, 7:03pm

Since you never train on the validation set, you can then train multiple models on the same training dataset, but while adjusting the hyperparameters. You can then select the model that performed the best on the validation set (given your metric of choice).
My understanding is that you can test your model on your test set, or even better, another test set that you did not create and has at least a fair number of samples that were not in the training set.

Topic		Replies	Views
Not able to overfit a transformer model on my data 🤗Transformers	0	535	June 14, 2023
Overcoming Overfitting in Transformer Fine-Tuning? 🤗Transformers	0	458	February 29, 2024
BERT fine tuning low epochs? Beginners	1	4740	September 13, 2023
Still overfitting, no matter how strong i regularize Beginners	0	1093	October 11, 2021
Question about validation and testing loss Models	6	2319	April 19, 2022

Why transformer overfit quickly? how to solve it?

Related topics