Training Fails after multiple passes: ValueError: The model did not return a loss from the inputs

Good morning,

I have a strange error that gets thrown once my training process is several percent underway.

ERROR: ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,token_type_ids,attention_mask. gets thrown.

What is unusual is that, as you can see from the terminal trace below, the training process has been successfully calculating and outputting the loss for multiple iterations. It appears that at some point in the process it is no longer able to calculate the loss on data that it had no problem with 80 or 90 times previously.

I am training a version of the BioBERT model on a custom dataset.
I am loading the full dataset and splitting out one study for evaluation.

MODEL AND TRAINING SETUP

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

hf_model = "pritamdeka/BioBert-PubMed200kRCT"
tokenizer = AutoTokenizer.from_pretrained(hf_model)
model = AutoModelForSequenceClassification.from_pretrained(hf_model, num_labels=2, ignore_mismatched_sizes=True)

training_args = TrainingArguments(
        output_dir='./results_'+study,          # output directory
        num_train_epochs=10,                     # total number of training epochs
        warmup_steps=500,                       # number of warmup steps for learning rate scheduler
        weight_decay=0.01,                      # strength of weight decay
        logging_dir='./logs',                   # directory for storing logs
        logging_steps=10,
)
trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset['train'],
        eval_dataset=eval_dataset['train'],
        compute_metrics=compute_metrics,
)
trainer.train()

The training process is able to execute and even gets to the point of saving a checkpoint.

Then it fails complaining that the line loss = self.compute_loss(model, inputs) inside trainer.py
cannot compute the loss.

TERMINAL OUTPUT ERROR TRACE

***** Running training *****
  Num examples = 33959
  Num Epochs = 10
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 42450
  Number of trainable parameters = 108311810
{'loss': 0.6865, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}                                                                                                                           
{'loss': 0.6379, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0}                                                                                                                           
{'loss': 0.5963, 'learning_rate': 3e-06, 'epoch': 0.01}                                                                                                                                           
{'loss': 0.5146, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01}                                                                                                                           
{'loss': 0.5735, 'learning_rate': 5e-06, 'epoch': 0.01}                                                                                                                                           
{'loss': 0.5246, 'learning_rate': 6e-06, 'epoch': 0.01}                                                                                                                                           
{'loss': 0.4027, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.02}                                                                                                                           
{'loss': 0.3588, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02}                                                                                                                           
{'loss': 0.4078, 'learning_rate': 9e-06, 'epoch': 0.02}                                                                                                                                           
{'loss': 0.3062, 'learning_rate': 1e-05, 'epoch': 0.02}                                                                                                                                           
{'loss': 0.4038, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.03}                                                                                                                          
{'loss': 0.2701, 'learning_rate': 1.2e-05, 'epoch': 0.03}                                                                                                                                         
{'loss': 0.4131, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.03}                                                                                                                          
{'loss': 0.1844, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.03}                                                                                                                          
{'loss': 0.4562, 'learning_rate': 1.5e-05, 'epoch': 0.04}                                                                                                                                         
{'loss': 0.3156, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.04}                                                                                                                          
{'loss': 0.2584, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.04}                                                                                                                          
{'loss': 0.3993, 'learning_rate': 1.8e-05, 'epoch': 0.04}                                                                                                                                         
{'loss': 0.2978, 'learning_rate': 1.9e-05, 'epoch': 0.04}                                                                                                                                         
{'loss': 0.2447, 'learning_rate': 2e-05, 'epoch': 0.05}                                                                                                                                           
{'loss': 0.4773, 'learning_rate': 2.1e-05, 'epoch': 0.05}                                                                                                                                         
{'loss': 0.316, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.05}                                                                                                                           
{'loss': 0.3301, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.05}                                                                                                                          
{'loss': 0.3171, 'learning_rate': 2.4e-05, 'epoch': 0.06}                                                                                                                                         
{'loss': 0.3225, 'learning_rate': 2.5e-05, 'epoch': 0.06}                                                                                                                                         
{'loss': 0.1792, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.06}                                                                                                                          
{'loss': 0.2666, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.06}                                                                                                                          
{'loss': 0.352, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.07}                                                                                                                           
{'loss': 0.1078, 'learning_rate': 2.9e-05, 'epoch': 0.07}                                                                                                                                         
{'loss': 0.3578, 'learning_rate': 3e-05, 'epoch': 0.07}                                                                                                                                           
{'loss': 0.2425, 'learning_rate': 3.1e-05, 'epoch': 0.07}                                                                                                                                         
{'loss': 0.3521, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.08}                                                                                                                          
{'loss': 0.3469, 'learning_rate': 3.3e-05, 'epoch': 0.08}                                                                                                                                         
{'loss': 0.2463, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.08}                                                                                                                          
{'loss': 0.3306, 'learning_rate': 3.5e-05, 'epoch': 0.08}                                                                                                                                         
{'loss': 0.3279, 'learning_rate': 3.6e-05, 'epoch': 0.08}                                                                                                                                         
{'loss': 0.3013, 'learning_rate': 3.7e-05, 'epoch': 0.09}                                                                                                                                         
{'loss': 0.3451, 'learning_rate': 3.8e-05, 'epoch': 0.09}                                                                                                                                         
{'loss': 0.4788, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.09}                                                                                                                          
{'loss': 0.2289, 'learning_rate': 4e-05, 'epoch': 0.09}                                                                                                                                           
{'loss': 0.2448, 'learning_rate': 4.1e-05, 'epoch': 0.1}                                                                                                                                          
{'loss': 0.1923, 'learning_rate': 4.2e-05, 'epoch': 0.1}                                                                                                                                          
{'loss': 0.4997, 'learning_rate': 4.3e-05, 'epoch': 0.1}                                                                                                                                          
{'loss': 0.1394, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.1}                                                                                                                           
{'loss': 0.244, 'learning_rate': 4.5e-05, 'epoch': 0.11}                                                                                                                                          
{'loss': 0.306, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.11}                                                                                                                            
{'loss': 0.4336, 'learning_rate': 4.7e-05, 'epoch': 0.11}                                                                                                                                         
{'loss': 0.3012, 'learning_rate': 4.8e-05, 'epoch': 0.11}                                                                                                                                         
{'loss': 0.2169, 'learning_rate': 4.9e-05, 'epoch': 0.12}                                                                                                                                         
{'loss': 0.3365, 'learning_rate': 5e-05, 'epoch': 0.12}                                                                                                                                           
  1%|β–ˆβ–‹                                                                                                                                                 | 500/42450 [4:26:47<379:46:19, 32.59s/it]Saving model checkpoint to ./results_LANB/checkpoint-500
Configuration saved in ./results_LANB/checkpoint-500/config.json
Model weights saved in ./results_LANB/checkpoint-500/pytorch_model.bin
{'loss': 0.2766, 'learning_rate': 4.99880810488677e-05, 'epoch': 0.12}                                                                                                                            
{'loss': 0.4657, 'learning_rate': 4.99761620977354e-05, 'epoch': 0.12}                                                                                                                            
{'loss': 0.4474, 'learning_rate': 4.99642431466031e-05, 'epoch': 0.12}                                                                                                                            
{'loss': 0.2642, 'learning_rate': 4.99523241954708e-05, 'epoch': 0.13}                                                                                                                            
{'loss': 0.2248, 'learning_rate': 4.99404052443385e-05, 'epoch': 0.13}                                                                                                                            
{'loss': 0.3614, 'learning_rate': 4.99284862932062e-05, 'epoch': 0.13}                                                                                                                            
{'loss': 0.3968, 'learning_rate': 4.99165673420739e-05, 'epoch': 0.13}                                                                                                                            
{'loss': 0.2896, 'learning_rate': 4.99046483909416e-05, 'epoch': 0.14}                                                                                                                            
{'loss': 0.3039, 'learning_rate': 4.98927294398093e-05, 'epoch': 0.14}                                                                                                                            
{'loss': 0.378, 'learning_rate': 4.9880810488676996e-05, 'epoch': 0.14}                                                                                                                           
{'loss': 0.3841, 'learning_rate': 4.98688915375447e-05, 'epoch': 0.14}                                                                                                                            
{'loss': 0.2408, 'learning_rate': 4.98569725864124e-05, 'epoch': 0.15}                                                                                                                            
{'loss': 0.3626, 'learning_rate': 4.98450536352801e-05, 'epoch': 0.15}                                                                                                                            
{'loss': 0.6374, 'learning_rate': 4.9833134684147795e-05, 'epoch': 0.15}                                                                                                                          
{'loss': 0.3034, 'learning_rate': 4.98212157330155e-05, 'epoch': 0.15}                                                                                                                            
{'loss': 0.3923, 'learning_rate': 4.98092967818832e-05, 'epoch': 0.16}                                                                                                                            
{'loss': 0.2183, 'learning_rate': 4.9797377830750896e-05, 'epoch': 0.16}                                                                                                                          
{'loss': 0.526, 'learning_rate': 4.9785458879618594e-05, 'epoch': 0.16}                                                                                                                           
{'loss': 0.5221, 'learning_rate': 4.97735399284863e-05, 'epoch': 0.16}                                                                                                                            
{'loss': 0.3909, 'learning_rate': 4.9761620977354e-05, 'epoch': 0.16}                                                                                                                             
{'loss': 0.2689, 'learning_rate': 4.9749702026221695e-05, 'epoch': 0.17}                                                                                                                          
{'loss': 0.3097, 'learning_rate': 4.973778307508939e-05, 'epoch': 0.17}                                                                                                                           
{'loss': 0.365, 'learning_rate': 4.972586412395709e-05, 'epoch': 0.17}                                                                                                                            
{'loss': 0.3746, 'learning_rate': 4.9713945172824796e-05, 'epoch': 0.17}                                                                                                                          
{'loss': 0.294, 'learning_rate': 4.9702026221692494e-05, 'epoch': 0.18}                                                                                                                           
{'loss': 0.517, 'learning_rate': 4.969010727056019e-05, 'epoch': 0.18}                                                                                                                            
{'loss': 0.2461, 'learning_rate': 4.967818831942789e-05, 'epoch': 0.18}                                                                                                                           
{'loss': 0.2441, 'learning_rate': 4.9666269368295595e-05, 'epoch': 0.18}                                                                                                                          
{'loss': 0.6501, 'learning_rate': 4.965435041716329e-05, 'epoch': 0.19}                                                                                                                           
{'loss': 0.2033, 'learning_rate': 4.964243146603099e-05, 'epoch': 0.19}                                                                                                                           
{'loss': 0.5471, 'learning_rate': 4.963051251489869e-05, 'epoch': 0.19}                                                                                                                           
{'loss': 0.3918, 'learning_rate': 4.9618593563766394e-05, 'epoch': 0.19}                                                                                                                          
{'loss': 0.3586, 'learning_rate': 4.960667461263409e-05, 'epoch': 0.2}                                                                                                                            
{'loss': 0.4375, 'learning_rate': 4.959475566150179e-05, 'epoch': 0.2}                                                                                                                            
{'loss': 0.4539, 'learning_rate': 4.958283671036949e-05, 'epoch': 0.2}                                                                                                                            
{'loss': 0.263, 'learning_rate': 4.957091775923719e-05, 'epoch': 0.2}                                                                                                                             
{'loss': 0.4606, 'learning_rate': 4.955899880810489e-05, 'epoch': 0.2}                                                                                                                            
{'loss': 0.3028, 'learning_rate': 4.954707985697259e-05, 'epoch': 0.21}                                                                                                                           
{'loss': 0.3524, 'learning_rate': 4.953516090584029e-05, 'epoch': 0.21}                                                                                                                           
{'loss': 0.4138, 'learning_rate': 4.952324195470799e-05, 'epoch': 0.21}                                                                                                                           
{'loss': 0.4208, 'learning_rate': 4.951132300357569e-05, 'epoch': 0.21}                                                                                                                           
{'loss': 0.2863, 'learning_rate': 4.949940405244339e-05, 'epoch': 0.22}                                                                                                                           
{'loss': 0.302, 'learning_rate': 4.9487485101311086e-05, 'epoch': 0.22}                                                                                                                           
{'loss': 0.3323, 'learning_rate': 4.9475566150178784e-05, 'epoch': 0.22}                                                                                                                          
{'loss': 0.4712, 'learning_rate': 4.946364719904649e-05, 'epoch': 0.22}                                                                                                                           
{'loss': 0.3064, 'learning_rate': 4.9451728247914187e-05, 'epoch': 0.23}                                                                                                                          
{'loss': 0.4308, 'learning_rate': 4.9439809296781885e-05, 'epoch': 0.23}                                                                                                                          
{'loss': 0.2751, 'learning_rate': 4.942789034564958e-05, 'epoch': 0.23}                                                                                                                           
{'loss': 0.2495, 'learning_rate': 4.941597139451729e-05, 'epoch': 0.23}                                                                                                                           
{'loss': 0.3821, 'learning_rate': 4.9404052443384986e-05, 'epoch': 0.24}                                                                                                                          
  2%|β–ˆβ–ˆβ–ˆβ–                                                                                                                                              | 1000/42450 [8:40:02<362:47:15, 31.51s/it]Saving model checkpoint to ./results_LANB/checkpoint-1000
Configuration saved in ./results_LANB/checkpoint-1000/config.json
Model weights saved in ./results_LANB/checkpoint-1000/pytorch_model.bin
{'loss': 0.3137, 'learning_rate': 4.9392133492252684e-05, 'epoch': 0.24}                                                                                                                          
{'loss': 0.2284, 'learning_rate': 4.938021454112038e-05, 'epoch': 0.24}                                                                                                                           
{'loss': 0.4685, 'learning_rate': 4.9368295589988086e-05, 'epoch': 0.24}                                                                                                                          
{'loss': 0.3821, 'learning_rate': 4.9356376638855784e-05, 'epoch': 0.24}                                                                                                                          
{'loss': 0.3947, 'learning_rate': 4.934445768772348e-05, 'epoch': 0.25}                                                                                                                           
{'loss': 0.3813, 'learning_rate': 4.933253873659118e-05, 'epoch': 0.25}                                                                                                                           
{'loss': 0.3266, 'learning_rate': 4.9320619785458885e-05, 'epoch': 0.25}                                                                                                                          
{'loss': 0.2316, 'learning_rate': 4.930870083432658e-05, 'epoch': 0.25}                                                                                                                           
{'loss': 0.3027, 'learning_rate': 4.929678188319428e-05, 'epoch': 0.26}                                                                                                                           
{'loss': 0.2437, 'learning_rate': 4.928486293206198e-05, 'epoch': 0.26}                                                                                                                           
{'loss': 0.402, 'learning_rate': 4.9272943980929684e-05, 'epoch': 0.26}                                                                                                                           
{'loss': 0.3114, 'learning_rate': 4.926102502979738e-05, 'epoch': 0.26}                                                                                                                           
{'loss': 0.276, 'learning_rate': 4.924910607866508e-05, 'epoch': 0.27}                                                                                                                            
  3%|β–ˆβ–ˆβ–ˆβ–‰                                                                                                                                              | 1139/42450 [9:49:49<332:02:34, 28.94s/it]Traceback (most recent call last):
  File "/Users/john/Projects/ACS_NLP/experiments/03_BioBertPubMed.py", line 88, in <module>
    trainer.train()
  File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 1501, in train
    return inner_training_loop(
  File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 2508, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/john/Projects/ACS_NLP/venv/lib/python3.9/site-packages/transformers/trainer.py", line 2553, in compute_loss
    raise ValueError(
ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,token_type_ids,attention_mask.
  3%|β–ˆβ–ˆβ–ˆβ–‰               

A colleague of mine pointed out that this is likely caused by a NULL value in the labels column.

I found several records with this problem,

So this is probably the issue. Re-running right now.

1 Like

@John-Hawkins , did you manage to find a solution. I am facing the same problem and I check I don’t have any Null values in the training or test set.

I just had this same error, I solved it. This error message is not at all clear to identify the real issue. in most comments here as well as in my case, the issue was with the dataset. So anyone who comes here once again, go check your dataset thoroughly for names, ids, anything missing or incorrect.