Why am I getting KeyError: 'loss'?

Why when I run trainer.train() it gives me Keyerror:‘loss’ previously i use something like start_text and stop_text and I read in previous solution that this the cause of error so I delete it, but it still give same error.Did you have any solution? Thanks

from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelWithLMHead.from_pretrained("distilgpt2")

from datasets import Dataset
dataset = Dataset.from_text('/content/drive/MyDrive/Colab_Notebooks/qna.txt')

tokenizer.pad_token = tokenizer.eos_token
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='/content/drive/MyDrive/Colab_Notebooks/GPT_checkpoint',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=1,  # batch size per device during training
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='/content/drive/MyDrive/Colab_Notebooks/GPT_checkpoint/logs',            # directory for storing logs
trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above

Here is the dataset sample:

Was Volta an Italian physicist?
Is Volta buried in the city of Pittsburgh?
Here is the full error message:

KeyError                                  Traceback (most recent call last)
<ipython-input-17-3435b262f1ae> in <module>()
----> 1 trainer.train()

3 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
   1804         if isinstance(k, str):
   1805             inner_dict = {k: v for (k, v) in self.items()}
-> 1806             return inner_dict[k]
   1807         else:
   1808             return self.to_tuple()[k]

KeyError: 'loss'
There are no labels in your dataset, so it can’t train (and the model does not produce a loss, hence your error). Maybe you wanted to use the DataCollatorForMaskedLM to generate those labels automatically?

my dataset has the labels, but also get a KeyError: ‘loss’

>>> d = next(iter(train_loader))
>>> d.keys()
dict_keys(['input_ids', 'attention_mask', 'labels'])
now exiting InteractiveConsole...
[INFO|trainer.py:1202] 2021-12-03 18:24:14,156 >> ***** Running training *****
[INFO|trainer.py:1203] 2021-12-03 18:24:14,156 >>   Num examples = 6667
[INFO|trainer.py:1204] 2021-12-03 18:24:14,156 >>   Num Epochs = 3
[INFO|trainer.py:1205] 2021-12-03 18:24:14,156 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1206] 2021-12-03 18:24:14,157 >>   Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:1207] 2021-12-03 18:24:14,157 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1208] 2021-12-03 18:24:14,157 >>   Total optimization steps = 20001
  0%|                                                                                                                                                         | 0/20001 [00:00<?, ?it/s]Traceback (most recent call last):
  File "run_train.py", line 90, in <module>
  File "run_train.py", line 85, in main
  File "/usr/local/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 1323, in train
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 1861, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 1905, in compute_loss
    loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
  File "/usr/local/anaconda3/lib/python3.7/site-packages/transformers/file_utils.py", line 2125, in __getitem__
    return inner_dict[k]
KeyError: 'loss'

@sgugger is there any advice?

You should debug the training step by step as highlighted in this course chapter.

Hi, I met the same situation.I found that Trainer.label_smoother is None,so the Trainer class didn’t calculate the loss,I don’t know how to deal with this.

Hey @sgugger and thank you for the great transformers. I have the same error while I want to fine-tune the facebook/bart-large-cnn for a summarization task. my dataset (after tokenization) looks like this :
train: Dataset({
features: [‘attention_mask’, ‘input_ids’, ‘summary’, ‘text’],
num_rows: 10980
test: Dataset({
features: [‘attention_mask’, ‘input_ids’, ‘summary’, ‘text’],
num_rows: 1161

and I am using thsi line to get the model:
model = AutoModelForSeq2SeqLM.from_pretrained(“facebook/bart-base”)
and training arguments like :
training_args = TrainingArguments(“test_trainer”)

I am also getting this message while trainer.train()
The following columns in the training set don’t have a corresponding argument in BartForConditionalGeneration.forward and have been ignored: text, summary.

can you please guide me?

I would suggest to use 0,1 instead of yes/no! maybe it helps

this video helped me to find a solution for my problem

