How to evaluate all model ckpts when using `run_language_modeling` with trainer?

Here it seems the code only evaluates the last ckpt, while the common practice is to evaluate all ckpts and select the best one. How could we achieve that with trainer? Thanks!

This code evaluates the model you pass to it. I think you mean that the train method does not save the best checkpoint at the root at the end of training, only the last checkpoint.

@sgugger yes, sort of. Ideally it should pick the best model from all ckpts and save it at the root. To do so I have to modify the code to keep track of the best model, right?

Yes, there is nothing for that in Trainer yet, so you should change the code to check at each eval if the model is better or not, This is something I have on my TODO to add at some point.

But there seems to be no callback function at each save_step. For example, the correct pipeline should be sth like

best_perf = 0 # assume larger is better
when current_step % save_step == 0:
    current_perf = check_model_perf()
    if current_perf > best_perf:
         best_perf = current_perf
         save_model() # will also delete any previous best model

But now I have to save all dumps first, and the check performance in eval stage. Is this the only thing I could do? Thanks, @sgugger

If you’re not changing the code inside Trainer, yes it’s the only thing you can do for now. It’s not ideal, but like I said, making this better is on my TODO

1 Like