How to monitor both train and validation metrics at the same step?

Hi,

I am finetuning BertForSequenceClassification using run_glue.py and I would like to output every logging_steps all the performance metrics of my model.

Currently, in the logs I see entries like
{'loss': 0.1867809295654297, 'learning_rate': 1.3071895424836603e-07, 'epoch': 2.980392156862745, 'total_flos': 2121344853980160, 'step': 456}
for the training loss and
{'eval_loss': 0.4489470714636048, 'eval_mcc': 0.6251852674757565, 'epoch': 3.0, 'total_flos': 2133905557962240, 'step': 459}
for the evaluation loss.

They are being printed separately, with validation losses in output only at the end of the run.

Where is the actual logging taking place in trainer.py? I’d like to know that so that I can output a single dictionary containing all the metrics.

I am using transformers 3.3.0 and run_glue.py with the flag --evaluation_strategy steps, setting low values =32 for both --logging_steps and --eval_steps. I am confused because evaluation using the validation set doesn’t seem to occur every eval_steps.

I revised Trainer doesn't show the loss at each step but I am still not sure about how to do this.

Hi @davidefiocco

logging_steps and eval_steps have different meaning,

logging_steps will only log the train loss , lr, epoch etc info and not the metrics,
eval_steps logs the metrics on valid set.

Here the steps refer to actual optimization steps , so if you are using 2 grad accumulation steps and your BS is 4 then 1 optimization step is equal to 8 trainer steps, so in this case if your eval_steps is 2 then they will logged at trainer step 16.

In the latest version if eval_steps are not specified, it’ll be set logging_steps by default

logging is done in the log method here

and it’s invoked here and here in the train method.

Hope this helps.

Hi @valhalla, thanks for your reply, but I am still puzzled.

If I run

python run_glue.py --model_name_or_path bert-base-cased 
                               --task_name CoLA
                               --do_train
                               --do_eval
                               --data_dir ./CoLA
                               --max_seq_length 128
                               --per_device_train_batch_size 32
                               --learning_rate 2e-5
                               --num_train_epochs 3.0
                               --output_dir output
                               --evaluation_strategy steps
                               --logging_steps 8
                               --eval_steps 2 

validation metrics (e.g. eval_mcc, eval_loss) do not appear in the stdout logs before the end of the run, whereas training losses are displayed (as expected) every 8 trainer steps. Is this intended behavior?

Shouldn’t eval metrics appear every eval_steps*per_device_train_batch_size (or something along these lines?

Interesting, I’ve just used Trainer for other task and eval logs were printed as expected.

Could you maybe try again and see.

Seems that one has to use --evaluate_during_training in order for this to succeed. I was very sidetracked by the warning

"FutureWarning: The 'evaluate_during_training' argument is deprecated in favor of 'evaluation_strategy'

appearing in the run, and this pushed me to remove it, but its presence allows to print eval losses/metrics.

setting evaluation_strategy to steps should achieve the same effect as setting --evaluate_during_training. If not please report again and tag Sylvain.

Thanks! Trying out my notebook above with the latest version (3.4.0) everything seems in order with evaluation_strategy=steps , but that wasn’t the case with 3.3.0, at least in my attempts.
So the update seem to have cleared this. Haven’t checked what change in the code would be responsible of the difference in behavior I observe.

I fixed a lot of small bugs in Trainer these past weeks so it was probably one of them :wink:

2 Likes

Thanks @valhalla and @sgugger for the fix!

I leave here a (hopefully) permalink to a notebook to reproduce the issue that I had with 3.3.0 https://colab.research.google.com/gist/davidefiocco/3bbe492033b5675ab03405019a71f9ce/colafinetuning.ipynb

I am facing exactly the same problem with run_qa.py: Evaluation results are only printed at the end of training. Tried a number of configs including --evaluation strategy steps but none helped.

Hi @sgugger ,
Can you kindly answer this question,

hey @Mariam, as described in the docs for TrainingArguments you may also need to configure eval_steps in addition to evaluation_strategy.

for example to evaluate every 100 steps with run_qa.py you could try the following:

python run_qa.py \
  --model_name_or_path bert-base-uncased \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/ \
  --evaluation_strategy steps \
  --eval_steps 100

by default, the trainer will evaluate every 500 steps (the value of loggin_steps) if eval_steps is not specified

1 Like

hey @lewtun ,
Thanks for your reply. When running the above I get train_results.json and eval_results.json with the following content:
train_results.json
{
“epoch”: 2.0,
“init_mem_cpu_alloc_delta”: 2540351488,
“init_mem_cpu_peaked_delta”: 0,
“init_mem_gpu_alloc_delta”: 266590720,
“init_mem_gpu_peaked_delta”: 0,
“train_mem_cpu_alloc_delta”: 16502784,
“train_mem_cpu_peaked_delta”: 331776,
“train_mem_gpu_alloc_delta”: 822129664,
“train_mem_gpu_peaked_delta”: 3155901440,
“train_runtime”: 18.1442,
“train_samples”: 168,
“train_samples_per_second”: 1.543
}

eval_result.json
{
“epoch”: 2.0,
“eval_samples”: 42,
“exact_match”: 39.02439024390244,
“f1”: 62.318702403264744
}

I would like to have the losses and metrics on both train and eval datasets. at the moment it is only returning one for the evaluation dataset

I created a colab network t oshow my problem. Really what I am after is to plot the train/val losses in wandb.

hey @Mariam, as far as i know you’ll have to create your own Trainer to evaluate the metrics on the training set - see this thread for a related discussion: Logging training accuracy using Trainer class

Hi @lewtun, even the loss? In the official tutorial trainer.train() seems to be returning train and validation losses.

hey @Mariam, my understanding is that the reason you see both the training and validation losses in the tutorial is because this is a jupyter notebook, so the Trainer uses the NotebookProgressCallback (link) to display the results during training.

when running the script, the PrinterCallback (link) is used instead and this callback only logs the training loss, learning rate and epoch number.

since you’re doing question-answering, my suggestion would be to either:

I tried defining my own callback. However, I got stuck on which event should I use (on epoch end?) and from where to get the train and eval losses.

it’s up to you, although i’d do it on_evaluate so that you can easily control the logging frequency with eval_steps. you could try subclassing TrainerCallback and copy-pasting / adapting the code from the notebook callback here: transformers/notebook.py at d9c62047a8d75e18d2849d345ab3394875a712ef · huggingface/transformers · GitHub

The problem I faced is that the loss is not logged in metrics to start with.