Hello, i have been wrecking my brain over this issue for 2 days now and i wanted to reach out and get some help.
I attached the code that gets the error at the bottom.
TLDR: Why do I get no validation loss and why are metrics not calculated?
So the first problem is that the validation loss column during training says ‘No log’ the entire time. That is when I don’t use metric_for_best_model='eval_loss'
and load_best_model_at_end=True
and callbacks=[EarlyStoppingCallback(early_stopping_patience = 2)]
I have seen this issue and other similar issues online but I couldn’t find a solution that works for me yet.
The other (probably related issue) that I have is that when I do use load_best_model_at_end
and early stopping I get the following error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[14], line 1
----> 1 trainer.train()
File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:1543, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1538 self.model_wrapped = self.model
1540 inner_training_loop = find_executable_batch_size(
1541 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1542 )
-> 1543 return inner_training_loop(
1544 args=args,
1545 resume_from_checkpoint=resume_from_checkpoint,
1546 trial=trial,
1547 ignore_keys_for_eval=ignore_keys_for_eval,
1548 )
File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:1868, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1865 self.state.epoch = epoch + (step + 1) / steps_in_epoch
1866 self.control = self.callback_handler.on_step_end(args, self.state, self.control)
-> 1868 self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
1869 else:
1870 self.control = self.callback_handler.on_substep_end(args, self.state, self.control)
File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:2135, in Trainer._maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval)
2132 self._report_to_hp_search(trial, self.state.global_step, metrics)
2134 if self.control.should_save:
-> 2135 self._save_checkpoint(model, trial, metrics=metrics)
2136 self.control = self.callback_handler.on_save(self.args, self.state, self.control)
File /opt/conda/envs/cicero-magnum/lib/python3.9/site-packages/transformers/trainer.py:2238, in Trainer._save_checkpoint(self, model, trial, metrics)
2236 if not metric_to_check.startswith("eval_"):
2237 metric_to_check = f"eval_{metric_to_check}"
-> 2238 metric_value = metrics[metric_to_check]
2240 operator = np.greater if self.args.greater_is_better else np.less
2241 if (
2242 self.state.best_metric is None
2243 or self.state.best_model_checkpoint is None
2244 or operator(metric_value, self.state.best_metric)
2245 ):
KeyError: 'eval_loss'
When I train the model without evaluation during training and evaluate after, I get this output. So no metrics or loss here either.
early stopping required metric_for_best_model, but did not find eval_loss so early stopping is disabled
{'eval_runtime': 8.2721,
'eval_samples_per_second': 146.396,
'eval_steps_per_second': 3.143}
I worked my way through the trainer code with a debugger and eventually I got to the evaluation loop. In line 3110 of trainer.py
it is defined that metrics should only be computed when self.compute_metrics is not None and all_preds is not None and all_labels in not None
. But for some reason all_labels
are indeed None.
That is how far I got today. Can anyone tell me what the underlying problem could be here?
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='micro')
acc = accuracy_score(labels, preds)
return {
'accuracy': acc,
'f1': f1,
'precision': precision,
'recall': recall
}
training_args = TrainingArguments(
output_dir="./results/",
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
num_train_epochs=5,
weight_decay=0.01,
metric_for_best_model='eval_loss',
load_best_model_at_end=True,
do_train=True,
do_eval=True,
evaluation_strategy ='steps',
logging_dir = "./logs/",
logging_steps = logging_steps,
eval_steps = logging_steps,
save_steps = logging_steps,
save_total_limit = 3,
label_names = list(label_dict.keys()),
no_cuda = False
)
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_texts["train"],
eval_dataset=tokenized_texts["val"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
callbacks=[EarlyStoppingCallback(early_stopping_patience = 2)]
)