How to Avoid Overfitting?

zanderbush · July 29, 2020, 6:02pm

I have a dataset with 7,000 lines I plan to train with GPT2. I am unsure how many steps I should train it to, however. Does anybody have any advice on how I can avoid overfitting?

RichardWang · July 30, 2020, 7:08am

Avoiding overfit is a big question, I’ll just say some terms I’ve know / heard may help.
data augmentation, layer wise lr, weight decay, gradient clip
There are lots of things you can explore.

Topic		Replies	Views
BERT fine tuning low epochs? Beginners	1	4726	September 13, 2023
Why transformer overfit quickly? how to solve it? 🤗Transformers	6	19971	February 28, 2023
GPT2 training examples 🤗Transformers	0	303	October 29, 2021
About training data pre-processing Beginners	0	186	March 2, 2023
GPT-2 fine-tuning Beginners	0	1606	June 12, 2023

How to Avoid Overfitting?

Related topics