Good morning,
I have a strange error that gets thrown once my training process is several percent underway.
ERROR: ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,token_type_ids,attention_mask.
gets thrown.
What is unusual is that, as you can see from the terminal trace below, the training process has been successfully calculating and outputting the loss for multiple iterations. It appears that at some point in the process it is no longer able to calculate the loss on data that it had no problem with 80 or 90 times previously.
I am training a version of the BioBERT model on a custom dataset.
I am loading the full dataset and splitting out one study for evaluation.
MODEL AND TRAINING SETUP
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
hf_model = "pritamdeka/BioBert-PubMed200kRCT"
tokenizer = AutoTokenizer.from_pretrained(hf_model)
model = AutoModelForSequenceClassification.from_pretrained(hf_model, num_labels=2, ignore_mismatched_sizes=True)
training_args = TrainingArguments(
output_dir='./results_'+study, # output directory
num_train_epochs=10, # total number of training epochs
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset['train'],
eval_dataset=eval_dataset['train'],
compute_metrics=compute_metrics,
)
trainer.train()
The training process is able to execute and even gets to the point of saving a checkpoint.
Then it fails complaining that the line loss = self.compute_loss(model, inputs)
inside trainer.py
cannot compute the loss.
TERMINAL OUTPUT ERROR TRACE
***** Running training *****
Num examples = 33959
Num Epochs = 10
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 42450
Number of trainable parameters = 108311810
{'loss': 0.6865, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}
{'loss': 0.6379, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0}
{'loss': 0.5963, 'learning_rate': 3e-06, 'epoch': 0.01}
{'loss': 0.5146, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01}
{'loss': 0.5735, 'learning_rate': 5e-06, 'epoch': 0.01}
{'loss': 0.5246, 'learning_rate': 6e-06, 'epoch': 0.01}
{'loss': 0.4027, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.02}
{'loss': 0.3588, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02}
{'loss': 0.4078, 'learning_rate': 9e-06, 'epoch': 0.02}
{'loss': 0.3062, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 0.4038, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.03}
{'loss': 0.2701, 'learning_rate': 1.2e-05, 'epoch': 0.03}
{'loss': 0.4131, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.03}
{'loss': 0.1844, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.03}
{'loss': 0.4562, 'learning_rate': 1.5e-05, 'epoch': 0.04}
{'loss': 0.3156, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.04}
{'loss': 0.2584, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.04}
{'loss': 0.3993, 'learning_rate': 1.8e-05, 'epoch': 0.04}
{'loss': 0.2978, 'learning_rate': 1.9e-05, 'epoch': 0.04}
{'loss': 0.2447, 'learning_rate': 2e-05, 'epoch': 0.05}
{'loss': 0.4773, 'learning_rate': 2.1e-05, 'epoch': 0.05}
{'loss': 0.316, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.05}
{'loss': 0.3301, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.05}
{'loss': 0.3171, 'learning_rate': 2.4e-05, 'epoch': 0.06}
{'loss': 0.3225, 'learning_rate': 2.5e-05, 'epoch': 0.06}
{'loss': 0.1792, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.06}
{'loss': 0.2666, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.06}
{'loss': 0.352, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.07}
{'loss': 0.1078, 'learning_rate': 2.9e-05, 'epoch': 0.07}
{'loss': 0.3578, 'learning_rate': 3e-05, 'epoch': 0.07}
{'loss': 0.2425, 'learning_rate': 3.1e-05, 'epoch': 0.07}
{'loss': 0.3521, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.08}
{'loss': 0.3469, 'learning_rate': 3.3e-05, 'epoch': 0.08}
{'loss': 0.2463, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.08}
{'loss': 0.3306, 'learning_rate': 3.5e-05, 'epoch': 0.08}
{'loss': 0.3279, 'learning_rate': 3.6e-05, 'epoch': 0.08}
{'loss': 0.3013, 'learning_rate': 3.7e-05, 'epoch': 0.09}
{'loss': 0.3451, 'learning_rate': 3.8e-05, 'epoch': 0.09}
{'loss': 0.4788, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.09}
{'loss': 0.2289, 'learning_rate': 4e-05, 'epoch': 0.09}
{'loss': 0.2448, 'learning_rate': 4.1e-05, 'epoch': 0.1}
{'loss': 0.1923, 'learning_rate': 4.2e-05, 'epoch': 0.1}
{'loss': 0.4997, 'learning_rate': 4.3e-05, 'epoch': 0.1}
{'loss': 0.1394, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.1}
{'loss': 0.244, 'learning_rate': 4.5e-05, 'epoch': 0.11}
{'loss': 0.306, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.11}
{'loss': 0.4336, 'learning_rate': 4.7e-05, 'epoch': 0.11}
{'loss': 0.3012, 'learning_rate': 4.8e-05, 'epoch': 0.11}
{'loss': 0.2169, 'learning_rate': 4.9e-05, 'epoch': 0.12}
{'loss': 0.3365, 'learning_rate': 5e-05, 'epoch': 0.12}
1%|█▋ | 500/42450 [4:26:47<379:46:19, 32.59s/it]Saving model checkpoint to ./results_LANB/checkpoint-500
Configuration saved in ./results_LANB/checkpoint-500/config.json
Model weights saved in ./results_LANB/checkpoint-500/pytorch_model.bin
{'loss': 0.2766, 'learning_rate': 4.99880810488677e-05, 'epoch': 0.12}
{'loss': 0.4657, 'learning_rate': 4.99761620977354e-05, 'epoch': 0.12}
{'loss': 0.4474, 'learning_rate': 4.99642431466031e-05, 'epoch': 0.12}
{'loss': 0.2642, 'learning_rate': 4.99523241954708e-05, 'epoch': 0.13}
{'loss': 0.2248, 'learning_rate': 4.99404052443385e-05, 'epoch': 0.13}
{'loss': 0.3614, 'learning_rate': 4.99284862932062e-05, 'epoch': 0.13}
{'loss': 0.3968, 'learning_rate': 4.99165673420739e-05, 'epoch': 0.13}
{'loss': 0.2896, 'learning_rate': 4.99046483909416e-05, 'epoch': 0.14}
{'loss': 0.3039, 'learning_rate': 4.98927294398093e-05, 'epoch': 0.14}
{'loss': 0.378, 'learning_rate': 4.9880810488676996e-05, 'epoch': 0.14}
{'loss': 0.3841, 'learning_rate': 4.98688915375447e-05, 'epoch': 0.14}
{'loss': 0.2408, 'learning_rate': 4.98569725864124e-05, 'epoch': 0.15}
{'loss': 0.3626, 'learning_rate': 4.98450536352801e-05, 'epoch': 0.15}
{'loss': 0.6374, 'learning_rate': 4.9833134684147795e-05, 'epoch': 0.15}
{'loss': 0.3034, 'learning_rate': 4.98212157330155e-05, 'epoch': 0.15}
{'loss': 0.3923, 'learning_rate': 4.98092967818832e-05, 'epoch': 0.16}
{'loss': 0.2183, 'learning_rate': 4.9797377830750896e-05, 'epoch': 0.16}
{'loss': 0.526, 'learning_rate': 4.9785458879618594e-05, 'epoch': 0.16}
{'loss': 0.5221, 'learning_rate': 4.97735399284863e-05, 'epoch': 0.16}
{'loss': 0.3909, 'learning_rate': 4.9761620977354e-05, 'epoch': 0.16}
{'loss': 0.2689, 'learning_rate': 4.9749702026221695e-05, 'epoch': 0.17}
{'loss': 0.3097, 'learning_rate': 4.973778307508939e-05, 'epoch': 0.17}
{'loss': 0.365, 'learning_rate': 4.972586412395709e-05, 'epoch': 0.17}
{'loss': 0.3746, 'learning_rate': 4.9713945172824796e-05, 'epoch': 0.17}
{'loss': 0.294, 'learning_rate': 4.9702026221692494e-05, 'epoch': 0.18}
{'loss': 0.517, 'learning_rate': 4.969010727056019e-05, 'epoch': 0.18}
{'loss': 0.2461, 'learning_rate': 4.967818831942789e-05, 'epoch': 0.18}
{'loss': 0.2441, 'learning_rate': 4.9666269368295595e-05, 'epoch': 0.18}
{'loss': 0.6501, 'learning_rate': 4.965435041716329e-05, 'epoch': 0.19}
{'loss': 0.2033, 'learning_rate': 4.964243146603099e-05, 'epoch': 0.19}
{'loss': 0.5471, 'learning_rate': 4.963051251489869e-05, 'epoch': 0.19}
{'loss': 0.3918, 'learning_rate': 4.9618593563766394e-05, 'epoch': 0.19}
{'loss': 0.3586, 'learning_rate': 4.960667461263409e-05, 'epoch': 0.2}
{'loss': 0.4375, 'learning_rate': 4.959475566150179e-05, 'epoch': 0.2}
{'loss': 0.4539, 'learning_rate': 4.958283671036949e-05, 'epoch': 0.2}
{'loss': 0.263, 'learning_rate': 4.957091775923719e-05, 'epoch': 0.2}
{'loss': 0.4606, 'learning_rate': 4.955899880810489e-05, 'epoch': 0.2}
{'loss': 0.3028, 'learning_rate': 4.954707985697259e-05, 'epoch': 0.21}
{'loss': 0.3524, 'learning_rate': 4.953516090584029e-05, 'epoch': 0.21}
{'loss': 0.4138, 'learning_rate': 4.952324195470799e-05, 'epoch': 0.21}
{'loss': 0.4208, 'learning_rate': 4.951132300357569e-05, 'epoch': 0.21}
{'loss': 0.2863, 'learning_rate': 4.949940405244339e-05, 'epoch': 0.22}
{'loss': 0.302, 'learning_rate': 4.9487485101311086e-05, 'epoch': 0.22}
{'loss': 0.3323, 'learning_rate': 4.9475566150178784e-05, 'epoch': 0.22}
{'loss': 0.4712, 'learning_rate': 4.946364719904649e-05, 'epoch': 0.22}
{'loss': 0.3064, 'learning_rate': 4.9451728247914187e-05, 'epoch': 0.23}
{'loss': 0.4308, 'learning_rate': 4.9439809296781885e-05, 'epoch': 0.23}
{'loss': 0.2751, 'learning_rate': 4.942789034564958e-05, 'epoch': 0.23}
{'loss': 0.2495, 'learning_rate': 4.941597139451729e-05, 'epoch': 0.23}
{'loss': 0.3821, 'learning_rate': 4.9404052443384986e-05, 'epoch': 0.24}
2%|███▍ | 1000/42450 [8:40:02<362:47:15, 31.51s/it]Saving model checkpoint to ./results_LANB/checkpoint-1000
Configuration saved in ./results_LANB/checkpoint-1000/config.json
Model weights saved in ./results_LANB/checkpoint-1000/pytorch_model.bin
{'loss': 0.3137, 'learning_rate': 4.9392133492252684e-05, 'epoch': 0.24}
{'loss': 0.2284, 'learning_rate': 4.938021454112038e-05, 'epoch': 0.24}
{'loss': 0.4685, 'learning_rate': 4.9368295589988086e-05, 'epoch': 0.24}
{'loss': 0.3821, 'learning_rate': 4.9356376638855784e-05, 'epoch': 0.24}
{'loss': 0.3947, 'learning_rate': 4.934445768772348e-05, 'epoch': 0.25}
{'loss': 0.3813, 'learning_rate': 4.933253873659118e-05, 'epoch': 0.25}
{'loss': 0.3266, 'learning_rate': 4.9320619785458885e-05, 'epoch': 0.25}
{'loss': 0.2316, 'learning_rate': 4.930870083432658e-05, 'epoch': 0.25}
{'loss': 0.3027, 'learning_rate': 4.929678188319428e-05, 'epoch': 0.26}
{'loss': 0.2437, 'learning_rate': 4.928486293206198e-05, 'epoch': 0.26}
{'loss': 0.402, 'learning_rate': 4.9272943980929684e-05, 'epoch': 0.26}
{'loss': 0.3114, 'learning_rate': 4.926102502979738e-05, 'epoch': 0.26}
{'loss': 0.276, 'learning_rate': 4.924910607866508e-05, 'epoch': 0.27}
3%|███▉ | 1139/42450 [9:49:49<332:02:34, 28.94s/it]Traceback (most recent call last):
File "/Users/john/Projects/ACS_NLP/experiments/03_BioBertPubMed.py", line 88, in <module>
trainer.train()
File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 1501, in train
return inner_training_loop(
File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 2508, in training_step
loss = self.compute_loss(model, inputs)
File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 2553, in compute_loss
raise ValueError(
ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,token_type_ids,attention_mask.
3%|███▉