Key Error 'loss' while fine tuning GPT-2 with the Trainer utility

training_args = TrainingArguments(
  output_dir='./results',          # output directory
  num_train_epochs=3,              # total # of training epochs
  per_device_train_batch_size=16,  # batch size per device during training
  per_device_eval_batch_size=16,   # batch size for evaluation
  logging_dir='./logs',            # directory for storing logs

trainer = Trainer(

Error Log:

/usr/local/lib/python3.6/dist-packages/transformers/ in train(self, model_path, trial)
745 tr_loss += self.training_step(model, inputs)
746 else:
–> 747 tr_loss += self.training_step(model, inputs)
748 self._total_flos += self.floating_point_ops(inputs)
/usr/local/lib/python3.6/dist-packages/transformers/ in training_step(self, model, inputs)
1073 loss = self.compute_loss(model, inputs)
1074 else:
-> 1075 loss = self.compute_loss(model, inputs)
1077 if self.args.n_gpu > 1:
/usr/local/lib/python3.6/dist-packages/transformers/ in compute_loss(self, model, inputs)
1103 self._past = outputs[self.args.past_index]
1104 # We don’t use .loss here since the model may return tuples instead of ModelOutput.
-> 1105 return outputs[“loss”] if isinstance(outputs, dict) else outputs[0]
1107 def is_local_process_zero(self) -> bool:
/usr/local/lib/python3.6/dist-packages/transformers/ in getitem(self, k)
1356 if isinstance(k, str):
1357 inner_dict = {k: v for (k, v) in self.items()}
-> 1358 return inner_dict[k]
1359 else:
1360 return self.to_tuple()[k]
KeyError: ‘loss’

If you have this error, it’s probably because you are not passing any labels to your model. It’s hard to know for sure since you don’t explain how you built your dataset.

Input data is a text file with entries differentiated by a newline in the following sequence:
<|startoftext|> sentence1… <|endoftext|>
<|startoftext|> sentence2… <|endoftext|>
<|startoftext|> sentence3… <|endoftext|>

This is the input to the Trainer:

train: Dataset({
features: [‘attention_mask’, ‘input_ids’],
num_rows: 25
test: Dataset({
features: [‘attention_mask’, ‘input_ids’],
num_rows: 10

So there is no labels, which is why it can’t train.

Thanks! could you point to me some references of adding labels?

The official examples have a fine-tuning script for causal models like GPT-2 and there is also a notebook with an example.

1 Like

I ignored adding the function and this line - I skipped it because i was using a very small dataset. It works now! Thank you so much for your help!

1 Like

Hmm…and would you have any idea if the labels did exist but it still gives the error (despite passing the label_names argument)?

Something like this:

{'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 'input_ids': [0, 18764, 9665, 38, 3572, 29228, 700, 5029, 102, 2],
 'src': 'Mizoram is ........',
 'tgt': 19}

I have similar issue. After I change the label key into exactly the word ‘labels’. Then it worked.

1 Like

But what is the reason that it only works with word “labels” or “label”