How to evaluate all model ckpts when using `run_language_modeling` with trainer?

mralexis · August 20, 2020, 5:38pm

huggingface/transformers/blob/master/examples/language-modeling/run_language_modeling.py#L272


    # For convenience, we also re-save the tokenizer to the same directory,
    # so that you can share your model easily on huggingface.co/models =)
    if trainer.is_world_master():
        tokenizer.save_pretrained(training_args.output_dir)

# Evaluation
results = {}
if training_args.do_eval:
    logger.info("*** Evaluate ***")

    eval_output = trainer.evaluate()

    perplexity = math.exp(eval_output["eval_loss"])
    result = {"perplexity": perplexity}

    output_eval_file = os.path.join(training_args.output_dir, "eval_results_lm.txt")
    if trainer.is_world_master():
        with open(output_eval_file, "w") as writer:
            logger.info("***** Eval results *****")
            for key in sorted(result.keys()):
                logger.info("  %s = %s", key, str(result[key]))

Here it seems the code only evaluates the last ckpt, while the common practice is to evaluate all ckpts and select the best one. How could we achieve that with trainer? Thanks!

sgugger · August 20, 2020, 9:06pm

This code evaluates the model you pass to it. I think you mean that the train method does not save the best checkpoint at the root at the end of training, only the last checkpoint.

mralexis · August 20, 2020, 9:15pm

@sgugger yes, sort of. Ideally it should pick the best model from all ckpts and save it at the root. To do so I have to modify the code to keep track of the best model, right?

sgugger · August 21, 2020, 1:06pm

Yes, there is nothing for that in Trainer yet, so you should change the code to check at each eval if the model is better or not, This is something I have on my TODO to add at some point.

mralexis · August 21, 2020, 4:47pm

But there seems to be no callback function at each save_step. For example, the correct pipeline should be sth like

best_perf = 0 # assume larger is better
when current_step % save_step == 0:
    current_perf = check_model_perf()
    if current_perf > best_perf:
         best_perf = current_perf
         save_model() # will also delete any previous best model

But now I have to save all dumps first, and the check performance in eval stage. Is this the only thing I could do? Thanks, @sgugger

sgugger · August 21, 2020, 4:55pm

If you’re not changing the code inside Trainer, yes it’s the only thing you can do for now. It’s not ideal, but like I said, making this better is on my TODO

Topic		Replies	Views
Evaluating the model during the run Intermediate	0	472	December 29, 2021
Unpacking transformer's trainer.eval() to see every example's output, loss Intermediate	4	322	April 9, 2024
Evaluate model at saved checkpoint 🤗Transformers	0	1295	June 22, 2021
Huggingface Trainer eval while training 🤗Transformers	1	721	December 31, 2022
Trainer.evaluate() 🤗Transformers	3	6863	May 11, 2021

How to evaluate all model ckpts when using `run_language_modeling` with trainer?

Related topics