fenil
March 3, 2021, 7:46am
1
I was finetuning BertForQuestionAnswering on nlp squad dateset with the following arguments
training_args = TrainingArguments(
"test-qa-squad",
learning_rate=2e-5,
weight_decay=0.01,
label_names = ["start_positions", "end_positions"],
num_train_epochs=5,
load_best_model_at_end=True,
evaluation_strategy='epoch'
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dl,
eval_dataset=train_dl
)
Then doing trainer.train() trains for some batches but then after a specific batch throws this error (one epoch isn’t complete yet)
KeyError Traceback (most recent call last)
<ipython-input-19-3435b262f1ae> in <module>()
----> 1 trainer.train()
3 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
1444 if isinstance(k, str):
1445 inner_dict = {k: v for (k, v) in self.items()}
-> 1446 return inner_dict[k]
1447 else:
1448 return self.to_tuple()[k]
KeyError: 'loss'
Is this some issue in the dataset? Any help is much appreciated
You should double check your datasets has items that are dictionaries with the keys "start_positions", "end_positions" (that may be why the model is not returning the loss).
Also, you seem to be passing dataloaders to the Trainer? It takes datasets.
Lastly, for easy debug you can do the following:
for batch in trainer.get_train_dataloader():
break
batch = {k: v.cuda() for k, v in batch.items()}
outputs = trainer.model(**batch)
to easily inspect what’s in your batch and your outputs.
2 Likes
mjc00
March 17, 2022, 7:30pm
3
@sgugger I am running into a similar problem KeyError: 'loss' my dataset does have the items as dictionaries (see image)
and my code is as follows:
from transformers import Trainer, TrainingArguments
batch_size = 64
logging_steps = len(dataset["train"]) // batch_size
model_name = f"{model_ckpt}-finetuned-test"
training_args = TrainingArguments(output_dir=model_name,
num_train_epochs=2,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
disable_tqdm=False,
logging_steps=logging_steps,
label_names = ['CategoryCode'],
#push_to_hub=True,
log_level="error")
trainer = Trainer(model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=dataset["train"],
eval_dataset=dataset["vald"],
tokenizer=tokenizer)
trainer.train();
Note: I am running the above mentioned code locally Mac M1.