How to monitor both train and validation metrics at the same step?

Hi,

I am finetuning BertForSequenceClassification using run_glue.py and I would like to output every logging_steps all the performance metrics of my model.

Currently, in the logs I see entries like
{'loss': 0.1867809295654297, 'learning_rate': 1.3071895424836603e-07, 'epoch': 2.980392156862745, 'total_flos': 2121344853980160, 'step': 456}
for the training loss and
{'eval_loss': 0.4489470714636048, 'eval_mcc': 0.6251852674757565, 'epoch': 3.0, 'total_flos': 2133905557962240, 'step': 459}
for the evaluation loss.

They are being printed separately, with validation losses in output only at the end of the run.

Where is the actual logging taking place in trainer.py? I’d like to know that so that I can output a single dictionary containing all the metrics.

I am using transformers 3.3.0 and run_glue.py with the flag --evaluation_strategy steps, setting low values =32 for both --logging_steps and --eval_steps. I am confused because evaluation using the validation set doesn’t seem to occur every eval_steps.

I revised Trainer doesn't show the loss at each step but I am still not sure about how to do this.

Hi @davidefiocco

logging_steps and eval_steps have different meaning,

logging_steps will only log the train loss , lr, epoch etc info and not the metrics,
eval_steps logs the metrics on valid set.

Here the steps refer to actual optimization steps , so if you are using 2 grad accumulation steps and your BS is 4 then 1 optimization step is equal to 8 trainer steps, so in this case if your eval_steps is 2 then they will logged at trainer step 16.

In the latest version if eval_steps are not specified, it’ll be set logging_steps by default

logging is done in the log method here

and it’s invoked here and here in the train method.

Hope this helps.

Hi @valhalla, thanks for your reply, but I am still puzzled.

If I run

python run_glue.py --model_name_or_path bert-base-cased 
                               --task_name CoLA
                               --do_train
                               --do_eval
                               --data_dir ./CoLA
                               --max_seq_length 128
                               --per_device_train_batch_size 32
                               --learning_rate 2e-5
                               --num_train_epochs 3.0
                               --output_dir output
                               --evaluation_strategy steps
                               --logging_steps 8
                               --eval_steps 2 

validation metrics (e.g. eval_mcc, eval_loss) do not appear in the stdout logs before the end of the run, whereas training losses are displayed (as expected) every 8 trainer steps. Is this intended behavior?

Shouldn’t eval metrics appear every eval_steps*per_device_train_batch_size (or something along these lines?

Interesting, I’ve just used Trainer for other task and eval logs were printed as expected.

Could you maybe try again and see.

Seems that one has to use --evaluate_during_training in order for this to succeed. I was very sidetracked by the warning

"FutureWarning: The 'evaluate_during_training' argument is deprecated in favor of 'evaluation_strategy'

appearing in the run, and this pushed me to remove it, but its presence allows to print eval losses/metrics.

setting evaluation_strategy to steps should achieve the same effect as setting --evaluate_during_training. If not please report again and tag Sylvain.

Thanks! Trying out my notebook above with the latest version (3.4.0) everything seems in order with evaluation_strategy=steps , but that wasn’t the case with 3.3.0, at least in my attempts.
So the update seem to have cleared this. Haven’t checked what change in the code would be responsible of the difference in behavior I observe.

I fixed a lot of small bugs in Trainer these past weeks so it was probably one of them :wink:

1 Like

Thanks @valhalla and @sgugger for the fix!

I leave here a (hopefully) permalink to a notebook to reproduce the issue that I had with 3.3.0 https://colab.research.google.com/gist/davidefiocco/3bbe492033b5675ab03405019a71f9ce/colafinetuning.ipynb