MRPC Reproducibility with transformers-4.1.0

I always get lower precision following the MRPC example, what’s the reason?

python run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/$TASK_NAME/

and get

12/18/2020 17:16:38 - INFO - __main__ -   ***** Eval results mrpc *****
12/18/2020 17:16:38 - INFO - __main__ -     eval_loss = 0.5318707227706909
12/18/2020 17:16:38 - INFO - __main__ -     eval_accuracy = 0.7622549019607843
12/18/2020 17:16:38 - INFO - __main__ -     eval_f1 = 0.8417618270799347
12/18/2020 17:16:38 - INFO - __main__ -     eval_combined_score = 0.8020083645203595
12/18/2020 17:16:38 - INFO - __main__ -     epoch = 3.0

12/18/2020 16:45:29 - INFO - __main__ -   ***** Eval results mrpc *****
12/18/2020 16:45:29 - INFO - __main__ -     eval_loss = 0.47723284363746643
12/18/2020 16:45:29 - INFO - __main__ -     eval_accuracy = 0.8063725490196079
12/18/2020 16:45:29 - INFO - __main__ -     eval_f1 = 0.868988391376451
12/18/2020 16:45:29 - INFO - __main__ -     eval_combined_score = 0.8376804701980294
12/18/2020 16:45:29 - INFO - __main__ -     epoch = 3.0

12/18/2020 16:34:37 - INFO - __main__ -   ***** Eval results mrpc *****
12/18/2020 16:34:37 - INFO - __main__ -     eval_loss = 0.571368932723999
12/18/2020 16:34:37 - INFO - __main__ -     eval_accuracy = 0.6838235294117647
12/18/2020 16:34:37 - INFO - __main__ -     eval_f1 = 0.8122270742358079
12/18/2020 16:34:37 - INFO - __main__ -     eval_combined_score = 0.7480253018237863
12/18/2020 16:34:37 - INFO - __main__ -     epoch = 3.0

GPU: GTX 1080
transformers: 4.1.0
Torch: 1.6.0
python: 3.8
Server: Ubuntu 18.04

Please don’t use two different account to post the exact same message in two different categories.