Learning rate and Data size

Onlydrinkwater · April 2, 2022, 7:01am

For fine tuning model like T5, should we tune the learning rate base on the length of the dataloader?

For example, I am now tuning T5 for just one epoch using 1k pairs of sentence with 10 batch size, which means the optimizor will take 100 steps. And I use 0.1 learning rate atm, which gives a me the lowest training loss without overfitting.

Now I increase the data to 1milion pair. Should I divide the learning rate by 1000? Otherwise, the optimizor will take 100000 times using the previous learning rate, which may cause overfitting?

I’m now using Adam and Adafactor, which help adjust the step size.

shaoniana1997 · April 2, 2022, 9:19am

酸辣酸辣酸辣酸辣酸辣

Topic		Replies	Views
Replicating SQuAD results on T5 Models	2	682	January 17, 2023
Finding good batch size and learning rate for fine tuning Beginners	0	6277	January 24, 2022
T5-small performance degradation with larger dataset: seeking advice Models	0	62	July 4, 2024
Fine-tune T5-small but lower performance Models	0	1407	April 21, 2022
How many steps or epochs to finetune T5-small/base/large on XSum? 🤗Transformers	0	1401	August 7, 2021

Learning rate and Data size

Related topics