How to monitor both train and validation metrics at the same step?

davidefiocco · September 29, 2020, 4:22pm

Hi,

I am finetuning BertForSequenceClassification using run_glue.py and I would like to output every logging_steps all the performance metrics of my model.

Currently, in the logs I see entries like
{'loss': 0.1867809295654297, 'learning_rate': 1.3071895424836603e-07, 'epoch': 2.980392156862745, 'total_flos': 2121344853980160, 'step': 456}
for the training loss and
{'eval_loss': 0.4489470714636048, 'eval_mcc': 0.6251852674757565, 'epoch': 3.0, 'total_flos': 2133905557962240, 'step': 459}
for the evaluation loss.

They are being printed separately, with validation losses in output only at the end of the run.

Where is the actual logging taking place in trainer.py? I’d like to know that so that I can output a single dictionary containing all the metrics.

I am using transformers 3.3.0 and run_glue.py with the flag --evaluation_strategy steps, setting low values =32 for both --logging_steps and --eval_steps. I am confused because evaluation using the validation set doesn’t seem to occur every eval_steps.

I revised Trainer doesn't show the loss at each step but I am still not sure about how to do this.

valhalla · September 29, 2020, 5:55pm

Hi @davidefiocco

logging_steps and eval_steps have different meaning,

logging_steps will only log the train loss , lr, epoch etc info and not the metrics,
eval_steps logs the metrics on valid set.

Here the steps refer to actual optimization steps , so if you are using 2 grad accumulation steps and your BS is 4 then 1 optimization step is equal to 8 trainer steps, so in this case if your eval_steps is 2 then they will logged at trainer step 16.

In the latest version if eval_steps are not specified, it’ll be set logging_steps by default

logging is done in the log method here

and it’s invoked here and here in the train method.

Hope this helps.

davidefiocco · September 30, 2020, 9:21pm

Hi @valhalla, thanks for your reply, but I am still puzzled.

If I run

python run_glue.py --model_name_or_path bert-base-cased 
                               --task_name CoLA
                               --do_train
                               --do_eval
                               --data_dir ./CoLA
                               --max_seq_length 128
                               --per_device_train_batch_size 32
                               --learning_rate 2e-5
                               --num_train_epochs 3.0
                               --output_dir output
                               --evaluation_strategy steps
                               --logging_steps 8
                               --eval_steps 2

validation metrics (e.g. eval_mcc, eval_loss) do not appear in the stdout logs before the end of the run, whereas training losses are displayed (as expected) every 8 trainer steps. Is this intended behavior?

Shouldn’t eval metrics appear every eval_steps*per_device_train_batch_size (or something along these lines?

valhalla · October 2, 2020, 2:46pm

Interesting, I’ve just used Trainer for other task and eval logs were printed as expected.

Could you maybe try again and see.

davidefiocco · October 9, 2020, 6:17pm

Seems that one has to use --evaluate_during_training in order for this to succeed. I was very sidetracked by the warning

"FutureWarning: The 'evaluate_during_training' argument is deprecated in favor of 'evaluation_strategy'

appearing in the run, and this pushed me to remove it, but its presence allows to print eval losses/metrics.

valhalla · October 12, 2020, 3:30pm

setting evaluation_strategy to steps should achieve the same effect as setting --evaluate_during_training. If not please report again and tag Sylvain.

davidefiocco · October 21, 2020, 7:51am

Thanks! Trying out my notebook above with the latest version (3.4.0) everything seems in order with evaluation_strategy=steps , but that wasn’t the case with 3.3.0, at least in my attempts.
So the update seem to have cleared this. Haven’t checked what change in the code would be responsible of the difference in behavior I observe.

sgugger · October 21, 2020, 1:33pm

I fixed a lot of small bugs in Trainer these past weeks so it was probably one of them

davidefiocco · October 22, 2020, 3:14pm

Thanks @valhalla and @sgugger for the fix!

I leave here a (hopefully) permalink to a notebook to reproduce the issue that I had with 3.3.0 https://colab.research.google.com/gist/davidefiocco/3bbe492033b5675ab03405019a71f9ce/colafinetuning.ipynb

Mariam · April 15, 2021, 1:35pm

I am facing exactly the same problem with run_qa.py: Evaluation results are only printed at the end of training. Tried a number of configs including --evaluation strategy steps but none helped.

Mariam · April 17, 2021, 1:48am

Hi @sgugger ,
Can you kindly answer this question,

lewtun · April 17, 2021, 10:40am

hey @Mariam, as described in the docs for TrainingArguments you may also need to configure eval_steps in addition to evaluation_strategy.

for example to evaluate every 100 steps with run_qa.py you could try the following:

python run_qa.py \
  --model_name_or_path bert-base-uncased \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/ \
  --evaluation_strategy steps \
  --eval_steps 100

by default, the trainer will evaluate every 500 steps (the value of loggin_steps) if eval_steps is not specified

Mariam · April 18, 2021, 6:55am

hey @lewtun ,
Thanks for your reply. When running the above I get train_results.json and eval_results.json with the following content:
train_results.json
{
“epoch”: 2.0,
“init_mem_cpu_alloc_delta”: 2540351488,
“init_mem_cpu_peaked_delta”: 0,
“init_mem_gpu_alloc_delta”: 266590720,
“init_mem_gpu_peaked_delta”: 0,
“train_mem_cpu_alloc_delta”: 16502784,
“train_mem_cpu_peaked_delta”: 331776,
“train_mem_gpu_alloc_delta”: 822129664,
“train_mem_gpu_peaked_delta”: 3155901440,
“train_runtime”: 18.1442,
“train_samples”: 168,
“train_samples_per_second”: 1.543
}

eval_result.json
{
“epoch”: 2.0,
“eval_samples”: 42,
“exact_match”: 39.02439024390244,
“f1”: 62.318702403264744
}

I would like to have the losses and metrics on both train and eval datasets. at the moment it is only returning one for the evaluation dataset

Mariam · April 18, 2021, 12:11pm

I created a colab network t oshow my problem. Really what I am after is to plot the train/val losses in wandb.

lewtun · April 18, 2021, 1:29pm

hey @Mariam, as far as i know you’ll have to create your own Trainer to evaluate the metrics on the training set - see this thread for a related discussion: Logging training accuracy using Trainer class

Mariam · April 18, 2021, 1:43pm

Hi @lewtun, even the loss? In the official tutorial trainer.train() seems to be returning train and validation losses.

lewtun · April 18, 2021, 7:59pm

hey @Mariam, my understanding is that the reason you see both the training and validation losses in the tutorial is because this is a jupyter notebook, so the Trainer uses the NotebookProgressCallback (link) to display the results during training.

when running the script, the PrinterCallback (link) is used instead and this callback only logs the training loss, learning rate and epoch number.

since you’re doing question-answering, my suggestion would be to either:

write your own callback to log both losses
edit the evaluate function of the QuestionAnsweringTrainer to log the validation loss: transformers/trainer_qa.py at d9c62047a8d75e18d2849d345ab3394875a712ef · huggingface/transformers · GitHub

Mariam · April 19, 2021, 2:18pm

I tried defining my own callback. However, I got stuck on which event should I use (on epoch end?) and from where to get the train and eval losses.

lewtun · April 19, 2021, 3:30pm

it’s up to you, although i’d do it on_evaluate so that you can easily control the logging frequency with eval_steps. you could try subclassing TrainerCallback and copy-pasting / adapting the code from the notebook callback here: transformers/notebook.py at d9c62047a8d75e18d2849d345ab3394875a712ef · huggingface/transformers · GitHub

Mariam · April 19, 2021, 4:48pm

The problem I faced is that the loss is not logged in metrics to start with.

Topic		Replies	Views
Trainer doesn't show the loss at each step 🤗Transformers	20	35429	May 9, 2024
Logging training accuracy using Trainer class 🤗Transformers	8	10468	December 2, 2021
Logs of training and validation loss Beginners	10	32685	February 14, 2025
Multiple Loss Tracking on Train and Evaluate Steps 🤗Transformers	3	100	February 26, 2025
How do i get Training and Validation Loss during fine tuning 🤗Transformers	2	14712	August 27, 2021

How to monitor both train and validation metrics at the same step?

Related topics