Different models when loading checkpoint (run_mlm)

I’m continuing training ‘bert-base-uncased’ with run_mlm.py.
I expected this 2 options to return the same model:

  1. Training for 1 epoch.
  2. Training for 2 epochs and saving checkpoint after first epoch.

Why are the first model (1 epoch) and the checkpoint in the second model are different?

And another question - is there a way to get perplexity of each checkpoint?
I tried to run the run_mlm script for each checkpoint (with --do_eval flag, without the --do_train flag). It worked though I’m not sure that’s the proper way…
The perplexity is quite different from the scenario of training “in one shot” till the checkpoint (as explained above in option 1.).



1 Like

Are you using a learning-rate-scheduler? If the learning rate of the single epoch is different from the learning-rate of the first epoch of two, then you will get quite different results.

I didn’t realize that the default lr_scheduler_type was linear. I updated–lr_scheduler_type to ‘constant’ and now the models are indeed the same.
Thank you very much for the (quick) response.

I’m still not sure how to get the perplexity of each checkpoint…