I want to start training a new model by loading a previous model I trained, I want to know what happens to the learning rate in this case – does it start at the learning rate I set, or does it start from the prev learning rate of the checkpoint?
I’m sure you’ve long since realized this (or maybe never had the problem), but I thought I’d mention it for the benefit of anyone else with questions about resuming.
I embarrassingly failed to give the checkpoint to both my model and my trainer, and only this morning realized that it won’t resume correctly this way. At the very least it will fail to continue from the correct training epoch and step, but I think it also impacts the scheduler in more subtle ways, since I got significantly poorer training results when loading the checkpoint only to the model, even though I manually set the learning rate (i.e., to the one stored in the checkpoint). Not sure exactly why this is… (non-linear learning rate decay, maybe?)
Anyway, hopefully this helps someone somewhere sometime.