Why when I run trainer.train() it gives me Keyerror:‘loss’ previously i use something like start_text and stop_text and I read in previous solution that this the cause of error so I delete it, but it still give same error.Did you have any solution? Thanks
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelWithLMHead.from_pretrained("distilgpt2")
from datasets import Dataset
dataset = Dataset.from_text('/content/drive/MyDrive/Colab_Notebooks/qna.txt')
tokenizer.pad_token = tokenizer.eos_token
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='/content/drive/MyDrive/Colab_Notebooks/GPT_checkpoint', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=1, # batch size per device during training
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='/content/drive/MyDrive/Colab_Notebooks/GPT_checkpoint/logs', # directory for storing logs
logging_steps=10
)
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=tokenized_datasets
)
Here is the dataset sample:
Was Volta an Italian physicist?
yes
Was Volta an Italian physicist?
yes
Is Volta buried in the city of Pittsburgh?
no
Is Volta buried in the city of Pittsburgh?
no
Here is the full error message:
KeyError Traceback (most recent call last)
<ipython-input-17-3435b262f1ae> in <module>()
----> 1 trainer.train()
3 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
1804 if isinstance(k, str):
1805 inner_dict = {k: v for (k, v) in self.items()}
-> 1806 return inner_dict[k]
1807 else:
1808 return self.to_tuple()[k]
KeyError: 'loss'
There are no labels in your dataset, so it can’t train (and the model does not produce a loss, hence your error). Maybe you wanted to use the DataCollatorForMaskedLM to generate those labels automatically?
Hi, I met the same situation.I found that Trainer.label_smoother is None,so the Trainer class didn’t calculate the loss,I don’t know how to deal with this.
Hey @sgugger and thank you for the great transformers. I have the same error while I want to fine-tune the facebook/bart-large-cnn for a summarization task. my dataset (after tokenization) looks like this :
DatasetDict({
train: Dataset({
features: [‘attention_mask’, ‘input_ids’, ‘summary’, ‘text’],
num_rows: 10980
})
test: Dataset({
features: [‘attention_mask’, ‘input_ids’, ‘summary’, ‘text’],
num_rows: 1161
})
})
and I am using thsi line to get the model:
model = AutoModelForSeq2SeqLM.from_pretrained(“facebook/bart-base”)
and training arguments like :
training_args = TrainingArguments(“test_trainer”)
I am also getting this message while trainer.train()
The following columns in the training set don’t have a corresponding argument in BartForConditionalGeneration.forward and have been ignored: text, summary.
Hello, I have the same problem and debugged into the code to find out the same thing. Have you resolved the issue? I am trying to fine-tune the BertForPreTraining Model.
@pipi, I was facing the exact same issue and fixed it by just changing the name of the column which had labels for my dataset to “label” i.e. in your case you can change “labels” to “label” and trainer hopefully should run fine then.
This was really weird for me that trainer expects the column name to be as “label” only but anyway the fix worked for me and hopefully it works for you as well.