I use the code(https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) to do pretraining. When I use resume_from_checkpoint to restart the training process from the checkpoint of 2000 step, the learning rate scheduler is as follows.
Same problem! Any progress here?
1 Like
Same problem @stas would you like to take a look?
You will need to file an issue with whoever wrote this code as Iâm not familiar with this repo.
Also I no longer maintain the HF transformers/Deepspeed integration if itâs related to HF Transformers.
Currently all HF transformers/Deepspeed integration is done via HF Accelerate.