Why do I get no validation loss and why are metrics not calculated?

bangsandglasses · February 23, 2023, 2:08pm

Hello, i have been wrecking my brain over this issue for 2 days now and i wanted to reach out and get some help.
I attached the code that gets the error at the bottom.

TLDR: Why do I get no validation loss and why are metrics not calculated?

So the first problem is that the validation loss column during training says ‘No log’ the entire time. That is when I don’t use metric_for_best_model='eval_loss' and load_best_model_at_end=True and callbacks=[EarlyStoppingCallback(early_stopping_patience = 2)]
I have seen this issue and other similar issues online but I couldn’t find a solution that works for me yet.

The other (probably related issue) that I have is that when I do use load_best_model_at_end and early stopping I get the following error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[14], line 1
----> 1 trainer.train()

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:1543, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1538     self.model_wrapped = self.model
   1540 inner_training_loop = find_executable_batch_size(
   1541     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1542 )
-> 1543 return inner_training_loop(
   1544     args=args,
   1545     resume_from_checkpoint=resume_from_checkpoint,
   1546     trial=trial,
   1547     ignore_keys_for_eval=ignore_keys_for_eval,
   1548 )

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:1868, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1865     self.state.epoch = epoch + (step + 1) / steps_in_epoch
   1866     self.control = self.callback_handler.on_step_end(args, self.state, self.control)
-> 1868     self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
   1869 else:
   1870     self.control = self.callback_handler.on_substep_end(args, self.state, self.control)

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:2135, in Trainer._maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval)
   2132     self._report_to_hp_search(trial, self.state.global_step, metrics)
   2134 if self.control.should_save:
-> 2135     self._save_checkpoint(model, trial, metrics=metrics)
   2136     self.control = self.callback_handler.on_save(self.args, self.state, self.control)

File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:2238, in Trainer._save_checkpoint(self, model, trial, metrics)
   2236 if not metric_to_check.startswith("eval_"):
   2237     metric_to_check = f"eval_{metric_to_check}"
-> 2238 metric_value = metrics[metric_to_check]
   2240 operator = np.greater if self.args.greater_is_better else np.less
   2241 if (
   2242     self.state.best_metric is None
   2243     or self.state.best_model_checkpoint is None
   2244     or operator(metric_value, self.state.best_metric)
   2245 ):

KeyError: 'eval_loss'

When I train the model without evaluation during training and evaluate after, I get this output. So no metrics or loss here either.

early stopping required metric_for_best_model, but did not find eval_loss so early stopping is disabled
{'eval_runtime': 8.2721,
 'eval_samples_per_second': 146.396,
 'eval_steps_per_second': 3.143}

I worked my way through the trainer code with a debugger and eventually I got to the evaluation loop. In line 3110 of trainer.py it is defined that metrics should only be computed when self.compute_metrics is not None and all_preds is not None and all_labels in not None. But for some reason all_labels are indeed None.

That is how far I got today. Can anyone tell me what the underlying problem could be here?

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='micro')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

training_args = TrainingArguments(
    output_dir="./results/",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    metric_for_best_model='eval_loss',
    load_best_model_at_end=True,
    do_train=True,
    do_eval=True,
    evaluation_strategy ='steps',
    logging_dir = "./logs/",
    logging_steps = logging_steps,
    eval_steps = logging_steps,
    save_steps = logging_steps,
    save_total_limit = 3,
    label_names = list(label_dict.keys()),
    no_cuda = False
)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_texts["train"],
    eval_dataset=tokenized_texts["val"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience = 2)]
)

bangsandglasses · February 24, 2023, 2:17pm

Just replying for anyone who has a similar problem. I could fix the problem by not using label_names . I must have misunderstood how label_names should be used. I thought you can provide descriptive names for the classes instead of labels 0, 1, 2, …, k.

felixtran · June 5, 2023, 3:54am

Hi @bangsandglasses , I also got stuck in this problem. Can you tell me more about how I can solve, the not using label_names is not very clear.

seba3y · August 14, 2023, 2:22pm

label_names mean the names of columns which the trainer will use it on validation,

shamork · February 28, 2025, 3:44am

label_names is NOT your data labels (such as classification list eg: [“good”,“bad”,“notsure”] )
it’s column names that contains label data in dataset.

you can print(train_data[0]), in train_data[0] there should be one or more property or column that conains mapped label data( int value), put it’s name in label_names

 train_dataset[0]={'input_ids': [151643,....], 'attention_mask': [0, 0....], 'labels': 0}

for code above, label_names=["labels"]

Topic		Replies	Views
Early stopping callback problem Beginners	2	8339	April 22, 2021
Trainer() shows no log for validation loss when using PEFT 🤗Transformers	2	540	September 11, 2024
No log for validation loss in trainer.train() Beginners	4	6100	April 13, 2024
Trainer doesn't show the loss at each step 🤗Transformers	20	35379	May 9, 2024
Validation Loss for VITMAE 🤗Transformers	1	588	December 30, 2022

Why do I get no validation loss and why are metrics not calculated?

Related topics