Resume_from_checkpoint does not configure learning rate scheduler correctly

zhaozijian · July 31, 2023, 5:13am

I use the code(https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) to do pretraining. When I use resume_from_checkpoint to restart the training process from the checkpoint of 2000 step, the learning rate scheduler is as follows.

tongyx361 · September 14, 2023, 5:43am

Same problem! Any progress here?

JackBAI · November 25, 2023, 2:45pm

Same problem @stas would you like to take a look?

stas · November 28, 2023, 5:24am

You will need to file an issue with whoever wrote this code as I’m not familiar with this repo.

Also I no longer maintain the HF transformers/Deepspeed integration if it’s related to HF Transformers.

Currently all HF transformers/Deepspeed integration is done via HF Accelerate.

Topic		Replies	Views
Resume Training with Lower Learning Rate Beginners	3	1321	January 5, 2025
Training Resumes with Increased Loss Despite Checkpoint Loading Beginners	0	89	September 5, 2024
Cannot Resume Training Beginners	1	1374	December 15, 2020
Checkpoint breaks with deepspeed 🤗Transformers	6	3433	March 20, 2021
Does starting training from a previous checkpoint reset the learning rate? 🤗Transformers	2	1662	August 20, 2022