Different models when loading checkpoint (run_mlm)

I’m continuing training ā€˜bert-base-uncased’ with run_mlm.py.
I expected this 2 options to return the same model:

  1. Training for 1 epoch.
  2. Training for 2 epochs and saving checkpoint after first epoch.

Why are the first model (1 epoch) and the checkpoint in the second model are different?


And another question - is there a way to get perplexity of each checkpoint?
I tried to run the run_mlm script for each checkpoint (with --do_eval flag, without the --do_train flag). It worked though I’m not sure that’s the proper way…
The perplexity is quite different from the scenario of training ā€œin one shotā€ till the checkpoint (as explained above in option 1.).

Thanks

Thanks

1 Like

Are you using a learning-rate-scheduler? If the learning rate of the single epoch is different from the learning-rate of the first epoch of two, then you will get quite different results.

Yes!
I didn’t realize that the default lr_scheduler_type was linear. I updated–lr_scheduler_type to ā€˜constant’ and now the models are indeed the same.
Thank you very much for the (quick) response.

I’m still not sure how to get the perplexity of each checkpoint…