Popping `inputs[labels]` when self.label_smoother is not None (in trainer.py)

Hi,

I was training my seq2seq model (I’m using Seq2seqTrainer) with label-smoothing and have encountered an error that input_ids was required in my training dataset, whereas I checked that I put them in the dataset.

While debugging it, I found that when self.label_smoother is not None, the labels item was popped out from inputs and the error came from outputs = model(**inputs) as shown in the following lines in trainer.py:

1872     def compute_loss(self, model, inputs, return_outputs=False):
1873         """
1874         How the loss is computed by Trainer. By default, all models return the loss in the first element.
1875 
1876         Subclass and override for custom behavior.
1877         """
1878         if self.label_smoother is not None and "labels" in inputs:
1879             labels = inputs.pop("labels")
1880         else:
1881             labels = None
1882         outputs = model(**inputs)

Question: is the line number 1879 intended? I think it would be either
labels = copy.deepcopy(inputs['labels']) or labels = inputs['labels']

I searched for this board but couldn’t find any similar post. That means other people are using the label-smoothing without any problem, which means I incorrectly understand the concept of the seq2seq training and label-smoothing.

Any comment would be greatly appreciated.

Hey @jbeh can you share a minimal reproducible example? For example, something simple that just shows:

  • How you load and tokenize the datasets
  • How you define the training arguments
  • How you define the trainer

That will help us understand better what is causing the issue :slight_smile:

The labels are popped because otherwise your model computes the losses twice, so two SoftMaxes, which is a very heavy operation. You need to pass along the decoder_input_ids when you want to use label smoothing with the Trainer, as generated by DataCollatorForSeq2Seq.