How to get per-eval-step score when using trainer?

I am training a model and am hoping to get the log of the per-evaluation-step performance while training. I currently set --evaluation_strategy steps and I could see logs got printed. However, is there a way I could get the table as an object (json or pd.DataFrame) in Python?

|Step|Training Loss|Validation Loss|Rouge1|Rouge2|Rougel|Rougelsum|Gen Len|Runtime|Samples Per Second|
|100|No log|2.458449|39.987500|16.891800|37.788500|37.794300|10.292600|26.358400|44.730000|
|200|No log|2.128455|49.043000|24.468600|47.341900|47.270600|10.463100|26.245500|44.922000|
|300|No log|1.980806|51.324400|25.405300|49.549300|49.507100|10.305300|25.733800|45.815000|
|400|No log|1.892222|53.523700|27.361200|51.650900|51.613200|10.371500|25.708300|45.861000|

Hi @mralexis, you could write a simple callback that saves the logs to disk - e.g. by adapting the PrinterCallback: transformers.trainer_callback — transformers 4.3.0 documentation

You can then pass your callback to the Trainer with the callbacks argument. Then you can load and process them after training :grinning_face_with_smiling_eyes:

Thanks, @lewtun !

Out of curiosity, how does the load_best_model_at_end parameter work? If it loads the best model based on the x times evaluation over training, then there must be something like a best_step stored somewhere?

Hi @mralexis, as described in the docs, load_best_model_at_end works in conjunction with metric_for_best_model.

Under the hood, the Trainer keeps track of the best checkpoint seen to date via the Trainer.state.best_model_checkpoint attribute (see here) which is then used by load_best_model_at_end here.

1 Like

@lewtun Just to be sure, so the Trainer object does not keep track of the metrics such that we as end-users can access them as well? Writing a callback for this feels a bit cumbersome, as all we want is already displayed every step/epoch by the Trainer object.

I do not know how hugging face stores these metrics internally, but to me it seems the most logical that it keeps the metrics in memory somewhere and re-displays the entire dataframe after each step. But I could be wrong here.