Learning rate, LR scheduler and optimiser choice for fine-tuning GPT2

itsmejim · September 3, 2020, 4:56am

I know the best choice is different depending on the actual dataset that we are fine-tuning on but I am just curious to know what combinations of learning rate, LR scheduler and optimiser have you guys found to be a good combination to train with in general? I am currently using AdamW, CosineAnnealingWarmRestarts, with a learning rate going from 0.002 to 0.0001, restarting at the end of each epoch.

prajjwal1 · September 3, 2020, 6:29am

You can refer to TrainingArguments to look at the defaults. Link. They usually work well.

Topic		Replies	Views
Using Cosine LR scheduler via TrainingArguments in Trainer Beginners	10	11483	June 3, 2024
Tensorboard support when using optimizer with 2 separate learning rates Intermediate	0	360	October 9, 2021
Seq2Seq Learning rate Intermediate	2	382	March 6, 2024
For the Seq2SeqTrainingArguments class, what happens when I set both adafactor=True and set a learning rate? 🤗Transformers	1	382	August 6, 2024
Trainer optimizer 🤗Transformers	11	8934	August 7, 2021

Learning rate, LR scheduler and optimiser choice for fine-tuning GPT2

Related topics