I have a general question and appreciate your feedback on it.
I am new to transformers. My main problem is that it overfits so quickly, I am using regularization methods such as augmentation and dropout, but after 2 epochs my validation accuracy starts to drop while the training accuracy reaches to highest (basically my model overfit).
do you have any suggestions?
Interestingly I never see this behavior when I use convolutions…
My personal thought is if your data is less, it will overfit quickly. If you want to avoid it, reduce epochs. But, best way is to gather more data.
Having said that neither transformer nor neural networks suffer too much from overfitting. Some papers are there I guess. It’s good in generalizing most of the times. Remember transformer like models have quite good number of parameters, that’s also one reason of overfitting. But in downstream tasks, even if it overfits, it’s useful right.
Pre training is like a person who graduate with Masters. Fine tuning is like doing PhD ( except here it is quick , use your graduation skills to be an expert in specific field. So overfitting is okay. Personal opinion only.
thanks for your reply. I have the same data in both cases.
but your answer helps actually. I think transformer just learn faster
Does data augmentation help?
Thanks, I met the same problem recently, the model is reaching almost 99% training accuracy but testing is always staying around 79%.
Hint: If you identify overfitting, use your validation set to tune your model hyper parameters. Once that’s done, use your unseen test set to do your final testing.