I’m using the example command line supplied with run_mlm.py (v4.4.1), but I’ve enabled more frequent logging as well as evaluation at the end of each epoch:
python run_mlm.py --model_name_or_path roberta-base \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--output_dir test-wikitest \
--do_train \
--do_eval \
--evaluation_strategy epoch \
--logging_steps 10
By default, this is configured to train over 3 epochs. When I run this to completion, I find that the eval_loss
calculated at the end of the last epoch is reported as:
[INFO|trainer.py:1775] 2021-03-18 18:16:19,116 >> ***** Running Evaluation *****
[INFO|trainer.py:1776] 2021-03-18 18:16:19,116 >> Num examples = 496
[INFO|trainer.py:1777] 2021-03-18 18:16:19,116 >> Batch size = 8
{'eval_loss': 1.2713477611541748, 'eval_runtime': 27.0394, 'eval_samples_per_second': 18.344, 'epoch': 3.0}
Immediately following this, the final evaluation (because I specified --do_eval
) is executed. But the eval_loss
reported is different:
[INFO|trainer.py:1775] 2021-03-18 18:16:47,957 >> ***** Running Evaluation *****
[INFO|trainer.py:1776] 2021-03-18 18:16:47,957 >> Num examples = 496
[INFO|trainer.py:1777] 2021-03-18 18:16:47,957 >> Batch size = 8
[INFO|trainer_pt_utils.py:650] 2021-03-18 18:17:15,018 >> ***** eval metrics *****
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> epoch = 3.0
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_loss = 1.2644
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_mem_cpu_alloc_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_mem_cpu_peaked_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_mem_gpu_alloc_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_mem_gpu_peaked_delta = 1584MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_runtime = 26.8687
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_samples = 496
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,018 >> eval_samples_per_second = 18.46
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:17:15,019 >> perplexity = 3.5409
I was expecting the two eval_loss
values to be the same, since no training happens between the two back-to-back evaluations and since the validation set is the same.
Further, when I invoke run_mlm.py
with the trained model checkpoint and do only an evaluation (no training at all):
python run_mlm.py --model_name_or_path test-wikitest \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--output_dir test-wikitest-evalonly \
--do_eval
I get still another value for eval_loss
:
[INFO|trainer.py:1775] 2021-03-18 18:46:12,713 >> ***** Running Evaluation *****
[INFO|trainer.py:1776] 2021-03-18 18:46:12,713 >> Num examples = 496
[INFO|trainer.py:1777] 2021-03-18 18:46:12,714 >> Batch size = 8
[INFO|trainer_pt_utils.py:650] 2021-03-18 18:46:35,544 >> ***** eval metrics *****
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_loss = 1.287
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_mem_cpu_alloc_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_mem_cpu_peaked_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_mem_gpu_alloc_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_mem_gpu_peaked_delta = 1584MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_runtime = 22.6793
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_samples = 496
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> eval_samples_per_second = 21.87
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> init_mem_cpu_alloc_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> init_mem_cpu_peaked_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> init_mem_gpu_alloc_delta = 476MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> init_mem_gpu_peaked_delta = 0MB
[INFO|trainer_pt_utils.py:655] 2021-03-18 18:46:35,545 >> perplexity = 3.6218
Can anyone tell me what the source of the non-determinism is?
Thanks in advance!