Different models when loading checkpoint (run_mlm)

Iā€™m continuing training ā€˜bert-base-uncasedā€™ with run_mlm.py.
I expected this 2 options to return the same model:

  1. Training for 1 epoch.
  2. Training for 2 epochs and saving checkpoint after first epoch.

Why are the first model (1 epoch) and the checkpoint in the second model are different?


And another question - is there a way to get perplexity of each checkpoint?
I tried to run the run_mlm script for each checkpoint (with --do_eval flag, without the --do_train flag). It worked though Iā€™m not sure thatā€™s the proper wayā€¦
The perplexity is quite different from the scenario of training ā€œin one shotā€ till the checkpoint (as explained above in option 1.).

Thanks

Thanks

1 Like

Are you using a learning-rate-scheduler? If the learning rate of the single epoch is different from the learning-rate of the first epoch of two, then you will get quite different results.

Yes!
I didnā€™t realize that the default lr_scheduler_type was linear. I updatedā€“lr_scheduler_type to ā€˜constantā€™ and now the models are indeed the same.
Thank you very much for the (quick) response.

Iā€™m still not sure how to get the perplexity of each checkpointā€¦