`KeyError: 'eval_loss'` when using Trainer with BertForQA

When I try to run BertForQuestionAnswering with a Trainer object, it reaches the end of the eval before throwing KeyError: 'eval_loss'(full traceback below).

I ran a very vanilla implementation based very closely on the Fine-tuning with custom datasets QA tutorial.

The training and validation both finish, but from the traceback, it seems like there is some problem when reporting results.
Am I missing something that should be there? Is this a bug? Is Trainer not supported here?

This is transformers v3.4.0.

tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings

    def __getitem__(self, idx):
        # self.encodings.keys() = ['input_ids', 'attention_mask', 'start_positions', 'end_positions']
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}

    def __len__(self):
        return len(self.encodings.input_ids)

train_dataset = MyDataset(train_encodings)
val_dataset = MyDataset(val_encodings)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

training_args = TrainingArguments(

trainer = Trainer(



KeyError                                  Traceback (most recent call last)
<ipython-input-22-7b137ef43258> in <module>
     20 )
---> 22 trainer.train()
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer.py in train(self, model_path, trial)
    791             self.control = self.callback_handler.on_epoch_end(self.args, self.state, self.control)
--> 792             self._maybe_log_save_evalute(tr_loss, model, trial, epoch)
    794             if self.args.tpu_metrics_debug or self.args.debug:
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer.py in _maybe_log_save_evalute(self, tr_loss, model, trial, epoch)
    843             metrics = self.evaluate()
    844             self._report_to_hp_search(trial, epoch, metrics)
--> 845             self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, metrics)
    847         if self.control.should_save:
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer_callback.py in on_evaluate(self, args, state, control, metrics)
    350     def on_evaluate(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, metrics):
    351         control.should_evaluate = False
--> 352         return self.call_event("on_evaluate", args, state, control, metrics=metrics)
    354     def on_save(self, args: TrainingArguments, state: TrainerState, control: TrainerControl):
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer_callback.py in call_event(self, event, args, state, control, **kwargs)
    374                 train_dataloader=self.train_dataloader,
    375                 eval_dataloader=self.eval_dataloader,
--> 376                 **kwargs,
    377             )
    378             # A Callback can skip the return of `control` if it doesn't change it.
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/utils/notebook.py in on_evaluate(self, args, state, control, metrics, **kwargs)
    324             else:
    325                 values["Step"] = state.global_step
--> 326             values["Validation Loss"] = metrics["eval_loss"]
    327             _ = metrics.pop("total_flos", None)
    328             _ = metrics.pop("epoch", None)
KeyError: 'eval_loss'

Trainer is untested on QA-problems, and this is actually my work for the end of the week/beginning of next :slight_smile:
Will give a quick look this morning to see if there is a way to have a quick fix for this, otherwise you’ll have to wait a tiny bit more.

Thank you! Great news that it might not be something I am doing wrong. :smiley: Hopefully it is as simple as pre-pending that eval_ string somewhere. :crossed_fingers:

I’ll stick with the non-Trainer workflow for this week and keep an eye on changes to master.

Oh one thing that might help is a very recent fix on the label names of QA models. Could you try with label_names = ["start_positions", "end_positions"] in your TrainingArguments?

Using label_names = ["start_positions", "end_positions"] looks like it took care of the problem! It finishes training successfully and I can use my_trainer.predict() successfully too.

I look forward to whatever QA improvements you add to the Trainer, but this is excellent.

1 Like

Ok then, if this works, an install from source with the latest master should also work (without passing this argument).

I can confirm, this label_names argument tweak also worked for me when training a custom multi-task token-classifier model (which has multiple label arguments). I was getting the exact same eval_loss error. Thanks @deppen8!