I’m continuing training ‘bert-base-uncased’ with run_mlm.py.
I expected this 2 options to return the same model:
- Training for 1 epoch.
- Training for 2 epochs and saving checkpoint after first epoch.
Why are the first model (1 epoch) and the checkpoint in the second model are different?
And another question - is there a way to get perplexity of each checkpoint?
I tried to run the run_mlm script for each checkpoint (with --do_eval flag, without the --do_train flag). It worked though I’m not sure that’s the proper way…
The perplexity is quite different from the scenario of training “in one shot” till the checkpoint (as explained above in option 1.).