I am finetuning BertForSequenceClassification using run_glue.py and I would like to output every logging_steps all the performance metrics of my model.
Currently, in the logs I see entries like {'loss': 0.1867809295654297, 'learning_rate': 1.3071895424836603e-07, 'epoch': 2.980392156862745, 'total_flos': 2121344853980160, 'step': 456}
for the training loss and {'eval_loss': 0.4489470714636048, 'eval_mcc': 0.6251852674757565, 'epoch': 3.0, 'total_flos': 2133905557962240, 'step': 459}
for the evaluation loss.
They are being printed separately, with validation losses in output only at the end of the run.
Where is the actual logging taking place in trainer.py? I’d like to know that so that I can output a single dictionary containing all the metrics.
I am using transformers 3.3.0 and run_glue.py with the flag --evaluation_strategy steps, setting low values =32 for both --logging_steps and --eval_steps. I am confused because evaluation using the validation set doesn’t seem to occur every eval_steps.
logging_steps and eval_steps have different meaning,
logging_steps will only log the train loss , lr, epoch etc info and not the metrics, eval_steps logs the metrics on valid set.
Here the steps refer to actual optimization steps , so if you are using 2 grad accumulation steps and your BS is 4 then 1 optimization step is equal to 8 trainer steps, so in this case if your eval_steps is 2 then they will logged at trainer step 16.
In the latest version if eval_steps are not specified, it’ll be set logging_steps by default
validation metrics (e.g. eval_mcc, eval_loss) do not appear in the stdout logs before the end of the run, whereas training losses are displayed (as expected) every 8 trainer steps. Is this intended behavior?
Shouldn’t eval metrics appear every eval_steps*per_device_train_batch_size (or something along these lines?
Thanks! Trying out my notebook above with the latest version (3.4.0) everything seems in order with evaluation_strategy=steps , but that wasn’t the case with 3.3.0, at least in my attempts.
So the update seem to have cleared this. Haven’t checked what change in the code would be responsible of the difference in behavior I observe.
I am facing exactly the same problem with run_qa.py: Evaluation results are only printed at the end of training. Tried a number of configs including --evaluation strategy steps but none helped.
hey @lewtun ,
Thanks for your reply. When running the above I get train_results.json and eval_results.json with the following content:
train_results.json
{
“epoch”: 2.0,
“init_mem_cpu_alloc_delta”: 2540351488,
“init_mem_cpu_peaked_delta”: 0,
“init_mem_gpu_alloc_delta”: 266590720,
“init_mem_gpu_peaked_delta”: 0,
“train_mem_cpu_alloc_delta”: 16502784,
“train_mem_cpu_peaked_delta”: 331776,
“train_mem_gpu_alloc_delta”: 822129664,
“train_mem_gpu_peaked_delta”: 3155901440,
“train_runtime”: 18.1442,
“train_samples”: 168,
“train_samples_per_second”: 1.543
}
hey @Mariam, my understanding is that the reason you see both the training and validation losses in the tutorial is because this is a jupyter notebook, so the Trainer uses the NotebookProgressCallback (link) to display the results during training.
when running the script, the PrinterCallback (link) is used instead and this callback only logs the training loss, learning rate and epoch number.
since you’re doing question-answering, my suggestion would be to either: