Resume_from_checkpoint does not configure learning rate scheduler correctly

I use the code(https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) to do pretraining. When I use resume_from_checkpoint to restart the training process from the checkpoint of 2000 step, the learning rate scheduler is as follows.

Same problem! Any progress here?

1 Like

Same problem @stas would you like to take a look?

You will need to file an issue with whoever wrote this code as I’m not familiar with this repo.

Also I no longer maintain the HF transformers/Deepspeed integration if it’s related to HF Transformers.

Currently all HF transformers/Deepspeed integration is done via HF Accelerate.