MRPC Reproducibility with transformers-4.1.0

linius · December 19, 2020, 11:17am

I always get lower precision following the MRPC example, what’s the reason?

python run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/$TASK_NAME/

and get

12/18/2020 17:16:38 - INFO - __main__ -   ***** Eval results mrpc *****
12/18/2020 17:16:38 - INFO - __main__ -     eval_loss = 0.5318707227706909
12/18/2020 17:16:38 - INFO - __main__ -     eval_accuracy = 0.7622549019607843
12/18/2020 17:16:38 - INFO - __main__ -     eval_f1 = 0.8417618270799347
12/18/2020 17:16:38 - INFO - __main__ -     eval_combined_score = 0.8020083645203595
12/18/2020 17:16:38 - INFO - __main__ -     epoch = 3.0

12/18/2020 16:45:29 - INFO - __main__ -   ***** Eval results mrpc *****
12/18/2020 16:45:29 - INFO - __main__ -     eval_loss = 0.47723284363746643
12/18/2020 16:45:29 - INFO - __main__ -     eval_accuracy = 0.8063725490196079
12/18/2020 16:45:29 - INFO - __main__ -     eval_f1 = 0.868988391376451
12/18/2020 16:45:29 - INFO - __main__ -     eval_combined_score = 0.8376804701980294
12/18/2020 16:45:29 - INFO - __main__ -     epoch = 3.0

12/18/2020 16:34:37 - INFO - __main__ -   ***** Eval results mrpc *****
12/18/2020 16:34:37 - INFO - __main__ -     eval_loss = 0.571368932723999
12/18/2020 16:34:37 - INFO - __main__ -     eval_accuracy = 0.6838235294117647
12/18/2020 16:34:37 - INFO - __main__ -     eval_f1 = 0.8122270742358079
12/18/2020 16:34:37 - INFO - __main__ -     eval_combined_score = 0.7480253018237863
12/18/2020 16:34:37 - INFO - __main__ -     epoch = 3.0

GPU: GTX 1080
transformers: 4.1.0
Torch: 1.6.0
python: 3.8
Server: Ubuntu 18.04

sgugger · December 20, 2020, 3:19pm

Please don’t use two different account to post the exact same message in two different categories.

Topic		Replies	Views
MRPC Reproducibility with transformers-4.1.0 Research	0	500	December 19, 2020
Why do the F1 and accuracy scores vary when I run the run_glue.py script from Hugging Face's Transformers library for the BERT-base model on the MNLI task, while using different numbers of GPUs? 🤗Transformers	0	148	June 19, 2023
Tensorflow Fine Tuning Notebook - MRPC dataset 🤗Transformers	0	124	November 21, 2023
How to run GLUE on my own fine-tuned model Beginners	0	280	June 29, 2022
Reproduce BERT and RoBERTa 🤗Transformers	1	974	July 24, 2023

MRPC Reproducibility with transformers-4.1.0

Related topics