`KeyError: 'eval_loss'` when using Trainer with BertForQA

When I try to run BertForQuestionAnswering with a Trainer object, it reaches the end of the eval before throwing KeyError: 'eval_loss'(full traceback below).

I ran a very vanilla implementation based very closely on the Fine-tuning with custom datasets QA tutorial.

The training and validation both finish, but from the traceback, it seems like there is some problem when reporting results.
Am I missing something that should be there? Is this a bug? Is Trainer not supported here?

This is transformers v3.4.0.

tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings

    def __getitem__(self, idx):
        # self.encodings.keys() = ['input_ids', 'attention_mask', 'start_positions', 'end_positions']
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}

    def __len__(self):
        return len(self.encodings.input_ids)

train_dataset = MyDataset(train_encodings)
val_dataset = MyDataset(val_encodings)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

training_args = TrainingArguments(

trainer = Trainer(



KeyError                                  Traceback (most recent call last)
<ipython-input-22-7b137ef43258> in <module>
     20 )
---> 22 trainer.train()
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer.py in train(self, model_path, trial)
    791             self.control = self.callback_handler.on_epoch_end(self.args, self.state, self.control)
--> 792             self._maybe_log_save_evalute(tr_loss, model, trial, epoch)
    794             if self.args.tpu_metrics_debug or self.args.debug:
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer.py in _maybe_log_save_evalute(self, tr_loss, model, trial, epoch)
    843             metrics = self.evaluate()
    844             self._report_to_hp_search(trial, epoch, metrics)
--> 845             self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, metrics)
    847         if self.control.should_save:
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer_callback.py in on_evaluate(self, args, state, control, metrics)
    350     def on_evaluate(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, metrics):
    351         control.should_evaluate = False
--> 352         return self.call_event("on_evaluate", args, state, control, metrics=metrics)
    354     def on_save(self, args: TrainingArguments, state: TrainerState, control: TrainerControl):
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/trainer_callback.py in call_event(self, event, args, state, control, **kwargs)
    374                 train_dataloader=self.train_dataloader,
    375                 eval_dataloader=self.eval_dataloader,
--> 376                 **kwargs,
    377             )
    378             # A Callback can skip the return of `control` if it doesn't change it.
~/SageMaker/conda_env/my_env/lib/python3.7/site-packages/transformers/utils/notebook.py in on_evaluate(self, args, state, control, metrics, **kwargs)
    324             else:
    325                 values["Step"] = state.global_step
--> 326             values["Validation Loss"] = metrics["eval_loss"]
    327             _ = metrics.pop("total_flos", None)
    328             _ = metrics.pop("epoch", None)
KeyError: 'eval_loss'

Trainer is untested on QA-problems, and this is actually my work for the end of the week/beginning of next :slight_smile:
Will give a quick look this morning to see if there is a way to have a quick fix for this, otherwise you’ll have to wait a tiny bit more.

Thank you! Great news that it might not be something I am doing wrong. :smiley: Hopefully it is as simple as pre-pending that eval_ string somewhere. :crossed_fingers:

I’ll stick with the non-Trainer workflow for this week and keep an eye on changes to master.

Oh one thing that might help is a very recent fix on the label names of QA models. Could you try with label_names = ["start_positions", "end_positions"] in your TrainingArguments?

Using label_names = ["start_positions", "end_positions"] looks like it took care of the problem! It finishes training successfully and I can use my_trainer.predict() successfully too.

I look forward to whatever QA improvements you add to the Trainer, but this is excellent.

Ok then, if this works, an install from source with the latest master should also work (without passing this argument).

I can confirm, this label_names argument tweak also worked for me when training a custom multi-task token-classifier model (which has multiple label arguments). I was getting the exact same eval_loss error. Thanks @deppen8!

@sgugger Still facing this issue, could you let me know if this has been solved