How to evaluate before first training step?

johngiorgi · June 8, 2022, 2:44am

Hi! I have a use case where I would like evaluation to happen at the beginning of training (before a training step has been taken) in addition to every n steps. I can easily get the latter using evaluation_strategy and eval_steps, but not sure how to get the former.

There is an argument, logging_first_step that sounds like it should do exactly what I need:

logging_first_step (bool, optional, defaults to False) — Whether to log and evaluate the first global_step or not.

But providing this argument does not lead to evaluation at the beginning of training like I would expect (tested with the run_summarization.py script). Does anyone have an idea how to get this behaviour from the HF Trainer?

sgugger · June 8, 2022, 11:36am

You can just add a line trainer.evaluate() before the call to train.

johngiorgi · June 8, 2022, 4:14pm

Thanks for the quick response! I guess this solution works, but the results do not end up in the log_history field of the training_state.json file, which is how I am tracking performance over time. So I guess two questions:

Is there a better way to track evaluation metrics over time (using the provided example scripts like run_summarization.py) than the log_history field of the training_state.json?
Might adding this functionality (evaluating during training before a training step has been taken) be a good idea? Seems like a pretty ubiquitous use case to me as you may want to plot performance over time and knowing the performance of the model (either randomly initialized or pre-trained) before the first train step is useful. Would be happy to take a crack at this if I could get some advice as to where to implement it.

frankier · January 17, 2023, 2:41pm

A alternative approach, which may or may not solve the problem is to use a callback, like so:

class EvaluateFirstStepCallback(TrainerCallback):
    def on_step_end(self, args, state, control, **kwargs):
        if state.global_step == 1:
            control.should_evaluate = True

trainer.add_callback(EvaluateFirstStepCallback())

It does seem like it might be nice to have as a built-in TrainingArguments, given it mirrors logging_first_step pretty closely.

kwang2049 · August 4, 2023, 11:27am

THanks for the post. One small issue here is, by doing this way, there will be one training step already done, which is not equal to the model evaluation without any training

zouharvi · August 31, 2023, 2:58pm

Doesn’t that get solved using on_step_begin? Works for me for NLLB finetuning with Seq2SeqTrainingArguments. The only caveat seems to be that train loss doesn’t exist at this point so the wandb plots are offset.

class EvaluateFirstStepCallback(TrainerCallback):
    def on_step_begin(self, args, state, control, **kwargs):
        if state.global_step == 1:
            control.should_evaluate = True

trainer.add_callback(EvaluateFirstStepCallback())

tongyx361 · September 23, 2023, 7:41pm

The code provided by @frankier and @zouharvi both seem to have some minor errors

According to trainer.py

on_step_begin(): if step % args.gradient_accumulation_steps == 0:, called before all operations except for the random state, when global_step has not yet been updated, and the model parameters have not yet been updated
on_step_end(): called after all operations in each training iteration, when global_step has been updated and when global_step == 1, if the gradient accumulation step is 1, the model parameters have been updated once
Therefore, it is necessary to call on_step_begin() at global_step == 0 in order to ensure that model parameters that have not yet been updated are evaluated.
The corresponding code is below:

class EvaluateFirstStepCallback(TrainerCallback):
    def on_step_begin(self, args, state, control, **kwargs):
        if state.global_step == 0:
            control.should_evaluate = True

trainer.add_callback(EvaluateFirstStepCallback())

wdavies · February 17, 2024, 10:35pm

I wish they’d fix this in the actual code. It’s so annoying that the flag for logging delay doesn’t work as expected. Wasted 10-15 mins finding this solution.

jacklanda · April 25, 2024, 10:53am

I have to say the progress bar will also step to “1” even using your providing code:

mmordig · July 31, 2024, 5:27am

The reason is that evaluation happens after training, i.e. should_evaluate is only checked afterwards.

jacklanda · August 3, 2024, 4:12pm

Hey everybody!

Now (2024) we can pass the parameter eval_on_start to your initialized TrainingArguments object to make your model evaluate before undergoing any training steps

This new parameter was renamed from a deprecated one, “sanity_evaluation”, as introduced here.

I hope this update will find all of you well!

Topic		Replies	Views
Trainer eval step callback Beginners	0	294	January 30, 2024
How to Log Training Loss at Step Zero in Hugging Face Trainer or SFT Trainer? Beginners	1	337	November 28, 2024
How to get per-eval-step score when using trainer? 🤗Transformers	4	1906	October 18, 2022
Logging training accuracy using Trainer class 🤗Transformers	8	10464	December 2, 2021
How to load metrics in HF Trainer for the best model when `load_best_model_at_end=true`? 🤗Transformers	0	740	November 4, 2021

How to evaluate before first training step?

Related topics