Huggingface LR Decay Schedulers Spend the first epoch w/ an LR of 0

mmdMLHB · December 20, 2022, 10:07pm

I don’t know if this is intended, or if I’m doing something wrong, but it looks to me both in practice and from the code that the LR schedulers in Transformers will spend all of the first epoch with a LR of zero.

E.g., the polynomial decay scheduler
uses LambdaLR within Pytorch, which sets the LR to the initial LR multiplied by a decay factor determined from passing an integer epoch parameter (which starts at zero) to the lambda_lr function specified here which means that for all of epoch 0 the returned decay factor will be 0 as well so the LR will be set to zero.

I also see this in practice when using this scheduler in concert with a pytorch lightning model.

Is this intended behavior for some reason? Or am I using this wrong or is this a bug in the HF library or something? Any insight would be greatly appreciated.

mmdMLHB · December 27, 2022, 9:13pm

The LR scheduler in lightning should be configured with interval='step' rather than 'epoch':
https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html?highlight=configure_optimizers#configure-optimizers

Topic		Replies	Views
Linear Learning Rate Warmup with step-decay Beginners	4	3263	April 21, 2021
By default how long does hugging face `trainer` run for? 🤗Transformers	0	199	July 16, 2023
Weight decay rate in create optimizer tensorflow Intermediate	0	599	April 6, 2022
How is the AdafactorScheluder suppose to be used? Models	5	4026	January 8, 2024
Trainer Ignoring Weight Decay, Beta arguments Beginners	1	893	July 28, 2023

Huggingface LR Decay Schedulers Spend the first epoch w/ an LR of 0

Related topics