Different models when loading checkpoint (run_mlm)

MrSky · February 24, 2021, 11:47am

I’m continuing training ‘bert-base-uncased’ with run_mlm.py.
I expected this 2 options to return the same model:

Training for 1 epoch.
Training for 2 epochs and saving checkpoint after first epoch.

Why are the first model (1 epoch) and the checkpoint in the second model are different?

And another question - is there a way to get perplexity of each checkpoint?
I tried to run the run_mlm script for each checkpoint (with --do_eval flag, without the --do_train flag). It worked though I’m not sure that’s the proper way…
The perplexity is quite different from the scenario of training “in one shot” till the checkpoint (as explained above in option 1.).

Thanks

rgwatwormhill · February 24, 2021, 8:40pm

Are you using a learning-rate-scheduler? If the learning rate of the single epoch is different from the learning-rate of the first epoch of two, then you will get quite different results.

MrSky · February 24, 2021, 9:46pm

Yes!
I didn’t realize that the default lr_scheduler_type was linear. I updated–lr_scheduler_type to ‘constant’ and now the models are indeed the same.
Thank you very much for the (quick) response.

I’m still not sure how to get the perplexity of each checkpoint…

Topic		Replies	Views
Resuming accelerate-based pretraining with different batch size Intermediate	0	767	January 31, 2023
Resuming training BERT from scratch with run_mlm.py Intermediate	2	2205	October 31, 2021
Same checkpoint produces different output 🤗Transformers	0	148	February 20, 2024
Why the checkpoint of old version of BERT can not be used for BERT with new version? 🤗Transformers	0	309	March 22, 2023
Bert LM pretraining: training loss goes to 0 at masking probability of 0.999 Beginners	2	2319	October 31, 2020

Different models when loading checkpoint (run_mlm)

Related topics