Logging training accuracy using Trainer class

Hello,

I am running BertForSequenceClassification and I would like to log the accuracy as well as other metrics that I have already defined for my training set. I saw in another issue that I have to add a self.evaluate(self.train_dataset) somewhere in the code, but I am a beginner when it comes to Python and deep learning in general so I am not sure where exactly I have to include it.

I was trying to replicate the evaluate() method of the Trainer class, taking the train_dataset as argument, but it did not work. It would really mean a lot if you could guide me as for where I should tweak the code!

Thank you for your help!

1 Like

hey @dbejarano31, assuming that you want to log the training metrics during training, i think there are (at least) two options:

  1. subclass TrainerCallback (docs) to create a custom callback that logs the training metrics by triggering an event with on_evaluate
  2. subclass Trainer and override the evaluate function (docs) to inject the additional evaluation code

option 2 might be easier to implement since you can use the existing logic as a template :slight_smile:

3 Likes

Thanks so much @lewtun!

I believe I managed to tweak the evaluate() method, but now I am struggling to log the metrics inside on_evaluate().

I keep getting the following error:

But when I inspect the log_history, I have both the training metrics and the eval_loss for the first epoch. I have been trying to find metrics but I haven’t had any success. I suspect it must be because of how I customized evaluate() to output a dictionary with the validation and training metrics, so below you can find my code.

`class MyTrainer(Trainer):
def init(self, model,
args = None,
data_collator = None,
train_dataset = None,
eval_dataset = None,
tokenizer = None,
model_init = None,
compute_metrics = None,
callbacks = None,
optimizers = (None,None)
):

super().__init__(model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init,
              compute_metrics, callbacks, optimizers) 

def evaluate(
self,
train_dataset = None,
eval_dataset: Optional[Dataset] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = “eval”,
) → Dict[str, float]:

    # memory metrics - must set up as early as possible
    self._memory_tracker.start()

    if eval_dataset is not None and not isinstance(eval_dataset, collections.abc.Sized):
        raise ValueError("eval_dataset must implement __len__")

    train_dataloader = self.get_train_dataloader()
    eval_dataloader = self.get_eval_dataloader(eval_dataset)
    start_time = time.time()

    train_output = self.prediction_loop(
        train_dataloader,
        description = 'Training',
        prediction_loss_only = True if self.compute_metrics is None else None,
        ignore_keys = ignore_keys,
        metric_key_prefix = 'train',
        )


    eval_output = self.prediction_loop(
        eval_dataloader,
        description="Evaluation",
        # No point gathering the predictions if there are no metrics, otherwise we defer to
        # self.args.prediction_loss_only
        prediction_loss_only=True if self.compute_metrics is None else None,
        ignore_keys=ignore_keys,
        metric_key_prefix=metric_key_prefix,
    )
    train_n_samples = len(self.train_dataset)
    train_output.metrics.update(speed_metrics('train', start_time, train_n_samples))
    self.log(train_output.metrics)

    eval_n_samples = len(eval_dataset if eval_dataset is not None else self.eval_dataset)
    eval_output.metrics.update(speed_metrics(metric_key_prefix, start_time, eval_n_samples))
    self.log(eval_output.metrics)

    if self.args.tpu_metrics_debug or self.args.debug:
        # tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.)
        xm.master_print(met.metrics_report())

    self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, eval_output.metrics)
    self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, train_output.metrics)

    self._memory_tracker.stop_and_update_metrics(train_output.metrics)
    self._memory_tracker.stop_and_update_metrics(eval_output.metrics)

    dic = {
    'Training metrics': train_output.metrics,
    'Validation metrics': eval_output.metrics
    }

    return dic`
2 Likes

hmm that is odd indeed. the error seems to be coming from NotebookProgressCallback which expects metrics to have an eval_loss field: transformers/notebook.py at 02f7c2fe66cf3ef11402adc3d9d8a3ddd189c717 · huggingface/transformers · GitHub

as a dirty hack, what happens if you do the following before the on_evaluate step is called:

eval_output.metrics["eval_loss"] = "No log"
self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, eval_output.metrics)
self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, train_output.metrics)

this ensures the eval metrics have an entry for eval_loss before the callback is called. if this hack works, it might be a bug in the callback that we should fix :slight_smile:

1 Like

@lewtun Tried the hack and it worked!! Thanks so much for your help!

More like an oversight since this callback predates the “metric_key_prefix” so at that time it was impossible to have anything else than eval_loss :wink:

1 Like

thanks for the context! would you agree that the callback could do with an upgrade to cover this case? (i’m happy to do it)

1 Like

If you want to have a go, by all means!

I do agree lewtun’s comment. The outputs of training_step function can only be accessed in compute_loss function. Inside compute_loss, loss = outputs[0] but other indices in outputs are not used. Sometimes we wish to have other metrics (e.g., training acc) including loss as the outputs and print those metrics on tensorboard. But this version can not access the outputs during training.