Logging training accuracy using Trainer class

dbejarano31 · April 15, 2021, 12:06pm

Hello,

I am running BertForSequenceClassification and I would like to log the accuracy as well as other metrics that I have already defined for my training set. I saw in another issue that I have to add a self.evaluate(self.train_dataset) somewhere in the code, but I am a beginner when it comes to Python and deep learning in general so I am not sure where exactly I have to include it.

I was trying to replicate the evaluate() method of the Trainer class, taking the train_dataset as argument, but it did not work. It would really mean a lot if you could guide me as for where I should tweak the code!

Thank you for your help!

lewtun · April 15, 2021, 4:59pm

hey @dbejarano31, assuming that you want to log the training metrics during training, i think there are (at least) two options:

subclass TrainerCallback (docs) to create a custom callback that logs the training metrics by triggering an event with on_evaluate
subclass Trainer and override the evaluate function (docs) to inject the additional evaluation code

option 2 might be easier to implement since you can use the existing logic as a template

dbejarano31 · April 16, 2021, 2:31pm

Thanks so much @lewtun!

I believe I managed to tweak the evaluate() method, but now I am struggling to log the metrics inside on_evaluate().

I keep getting the following error:

But when I inspect the log_history, I have both the training metrics and the eval_loss for the first epoch. I have been trying to find metrics but I haven’t had any success. I suspect it must be because of how I customized evaluate() to output a dictionary with the validation and training metrics, so below you can find my code.

`class MyTrainer(Trainer):
def init(self, model,
args = None,
data_collator = None,
train_dataset = None,
eval_dataset = None,
tokenizer = None,
model_init = None,
compute_metrics = None,
callbacks = None,
optimizers = (None,None)
):

super().__init__(model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init,
              compute_metrics, callbacks, optimizers)

def evaluate(
self,
train_dataset = None,
eval_dataset: Optional[Dataset] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = “eval”,
) → Dict[str, float]:

    # memory metrics - must set up as early as possible
    self._memory_tracker.start()

    if eval_dataset is not None and not isinstance(eval_dataset, collections.abc.Sized):
        raise ValueError("eval_dataset must implement __len__")

    train_dataloader = self.get_train_dataloader()
    eval_dataloader = self.get_eval_dataloader(eval_dataset)
    start_time = time.time()

    train_output = self.prediction_loop(
        train_dataloader,
        description = 'Training',
        prediction_loss_only = True if self.compute_metrics is None else None,
        ignore_keys = ignore_keys,
        metric_key_prefix = 'train',
        )


    eval_output = self.prediction_loop(
        eval_dataloader,
        description="Evaluation",
        # No point gathering the predictions if there are no metrics, otherwise we defer to
        # self.args.prediction_loss_only
        prediction_loss_only=True if self.compute_metrics is None else None,
        ignore_keys=ignore_keys,
        metric_key_prefix=metric_key_prefix,
    )
    train_n_samples = len(self.train_dataset)
    train_output.metrics.update(speed_metrics('train', start_time, train_n_samples))
    self.log(train_output.metrics)

    eval_n_samples = len(eval_dataset if eval_dataset is not None else self.eval_dataset)
    eval_output.metrics.update(speed_metrics(metric_key_prefix, start_time, eval_n_samples))
    self.log(eval_output.metrics)

    if self.args.tpu_metrics_debug or self.args.debug:
        # tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.)
        xm.master_print(met.metrics_report())

    self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, eval_output.metrics)
    self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, train_output.metrics)

    self._memory_tracker.stop_and_update_metrics(train_output.metrics)
    self._memory_tracker.stop_and_update_metrics(eval_output.metrics)

    dic = {
    'Training metrics': train_output.metrics,
    'Validation metrics': eval_output.metrics
    }

    return dic`

lewtun · April 16, 2021, 4:13pm

hmm that is odd indeed. the error seems to be coming from NotebookProgressCallback which expects metrics to have an eval_loss field: transformers/notebook.py at 02f7c2fe66cf3ef11402adc3d9d8a3ddd189c717 · huggingface/transformers · GitHub

as a dirty hack, what happens if you do the following before the on_evaluate step is called:

eval_output.metrics["eval_loss"] = "No log"
self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, eval_output.metrics)
self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, train_output.metrics)

this ensures the eval metrics have an entry for eval_loss before the callback is called. if this hack works, it might be a bug in the callback that we should fix

dbejarano31 · April 16, 2021, 7:53pm

@lewtun Tried the hack and it worked!! Thanks so much for your help!

sgugger · April 17, 2021, 1:23pm

More like an oversight since this callback predates the “metric_key_prefix” so at that time it was impossible to have anything else than eval_loss

lewtun · April 17, 2021, 2:34pm

thanks for the context! would you agree that the callback could do with an upgrade to cover this case? (i’m happy to do it)

sgugger · April 19, 2021, 12:52pm

If you want to have a go, by all means!

lixsh6 · December 2, 2021, 8:32am

I do agree lewtun’s comment. The outputs of training_step function can only be accessed in compute_loss function. Inside compute_loss, loss = outputs[0] but other indices in outputs are not used. Sometimes we wish to have other metrics (e.g., training acc) including loss as the outputs and print those metrics on tensorboard. But this version can not access the outputs during training.

Topic		Replies	Views
Log training accuracy using Trainer class Beginners	1	684	December 19, 2021
Trainer API to log both Training and Validation Metrics 🤗Transformers	2	1696	July 1, 2021
Metrics for Training Set in Trainer 🤗Transformers	11	27096	March 14, 2025
How to monitor both train and validation metrics at the same step? 🤗Transformers	21	15357	July 29, 2021
How to show the learning rate during training Beginners	12	4071	January 2, 2024

Logging training accuracy using Trainer class

Related topics