fenil
March 3, 2021, 7:46am
1
I was finetuning BertForQuestionAnswering
on nlp squad dateset with the following arguments
training_args = TrainingArguments(
"test-qa-squad",
learning_rate=2e-5,
weight_decay=0.01,
label_names = ["start_positions", "end_positions"],
num_train_epochs=5,
load_best_model_at_end=True,
evaluation_strategy='epoch'
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dl,
eval_dataset=train_dl
)
Then doing trainer.train()
trains for some batches but then after a specific batch throws this error (one epoch isn’t complete yet)
KeyError Traceback (most recent call last)
<ipython-input-19-3435b262f1ae> in <module>()
----> 1 trainer.train()
3 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
1444 if isinstance(k, str):
1445 inner_dict = {k: v for (k, v) in self.items()}
-> 1446 return inner_dict[k]
1447 else:
1448 return self.to_tuple()[k]
KeyError: 'loss'
Is this some issue in the dataset? Any help is much appreciated
You should double check your datasets has items that are dictionaries with the keys "start_positions", "end_positions"
(that may be why the model is not returning the loss).
Also, you seem to be passing dataloaders to the Trainer
? It takes datasets.
Lastly, for easy debug you can do the following:
for batch in trainer.get_train_dataloader():
break
batch = {k: v.cuda() for k, v in batch.items()}
outputs = trainer.model(**batch)
to easily inspect what’s in your batch
and your outputs
.
2 Likes
mjc00
March 17, 2022, 7:30pm
3
@sgugger I am running into a similar problem KeyError: 'loss'
my dataset does have the items as dictionaries (see image)
and my code is as follows:
from transformers import Trainer, TrainingArguments
batch_size = 64
logging_steps = len(dataset["train"]) // batch_size
model_name = f"{model_ckpt}-finetuned-test"
training_args = TrainingArguments(output_dir=model_name,
num_train_epochs=2,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
disable_tqdm=False,
logging_steps=logging_steps,
label_names = ['CategoryCode'],
#push_to_hub=True,
log_level="error")
trainer = Trainer(model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=dataset["train"],
eval_dataset=dataset["vald"],
tokenizer=tokenizer)
trainer.train();
Note: I am running the above mentioned code locally Mac M1.