Why do I get no validation loss and why are metrics not calculated?

Hello, i have been wrecking my brain over this issue for 2 days now and i wanted to reach out and get some help.
I attached the code that gets the error at the bottom.

TLDR: Why do I get no validation loss and why are metrics not calculated?

So the first problem is that the validation loss column during training says ‘No log’ the entire time. That is when I don’t use metric_for_best_model='eval_loss' and load_best_model_at_end=True and callbacks=[EarlyStoppingCallback(early_stopping_patience = 2)]
I have seen this issue and other similar issues online but I couldn’t find a solution that works for me yet.

The other (probably related issue) that I have is that when I do use load_best_model_at_end and early stopping I get the following error message:

KeyError                                  Traceback (most recent call last)
Cell In[14], line 1
----> 1 trainer.train()

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:1543, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1538     self.model_wrapped = self.model
   1540 inner_training_loop = find_executable_batch_size(
   1541     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1542 )
-> 1543 return inner_training_loop(
   1544     args=args,
   1545     resume_from_checkpoint=resume_from_checkpoint,
   1546     trial=trial,
   1547     ignore_keys_for_eval=ignore_keys_for_eval,
   1548 )

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:1868, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1865     self.state.epoch = epoch + (step + 1) / steps_in_epoch
   1866     self.control = self.callback_handler.on_step_end(args, self.state, self.control)
-> 1868     self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
   1869 else:
   1870     self.control = self.callback_handler.on_substep_end(args, self.state, self.control)

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:2135, in Trainer._maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval)
   2132     self._report_to_hp_search(trial, self.state.global_step, metrics)
   2134 if self.control.should_save:
-> 2135     self._save_checkpoint(model, trial, metrics=metrics)
   2136     self.control = self.callback_handler.on_save(self.args, self.state, self.control)

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:2238, in Trainer._save_checkpoint(self, model, trial, metrics)
   2236 if not metric_to_check.startswith("eval_"):
   2237     metric_to_check = f"eval_{metric_to_check}"
-> 2238 metric_value = metrics[metric_to_check]
   2240 operator = np.greater if self.args.greater_is_better else np.less
   2241 if (
   2242     self.state.best_metric is None
   2243     or self.state.best_model_checkpoint is None
   2244     or operator(metric_value, self.state.best_metric)
   2245 ):

KeyError: 'eval_loss'

When I train the model without evaluation during training and evaluate after, I get this output. So no metrics or loss here either.

early stopping required metric_for_best_model, but did not find eval_loss so early stopping is disabled
{'eval_runtime': 8.2721,
 'eval_samples_per_second': 146.396,
 'eval_steps_per_second': 3.143}

I worked my way through the trainer code with a debugger and eventually I got to the evaluation loop. In line 3110 of trainer.py it is defined that metrics should only be computed when self.compute_metrics is not None and all_preds is not None and all_labels in not None. But for some reason all_labels are indeed None.

That is how far I got today. Can anyone tell me what the underlying problem could be here?

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='micro')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall

training_args = TrainingArguments(
    evaluation_strategy ='steps',
    logging_dir = "./logs/",
    logging_steps = logging_steps,
    eval_steps = logging_steps,
    save_steps = logging_steps,
    save_total_limit = 3,
    label_names = list(label_dict.keys()),
    no_cuda = False

trainer = Trainer(
    callbacks=[EarlyStoppingCallback(early_stopping_patience = 2)]

Just replying for anyone who has a similar problem. I could fix the problem by not using label_names . I must have misunderstood how label_names should be used. I thought you can provide descriptive names for the classes instead of labels 0, 1, 2, …, k.


Hi @bangsandglasses , I also got stuck in this problem. Can you tell me more about how I can solve, the not using label_names is not very clear.

label_names mean the names of columns which the trainer will use it on validation,