Will Trainer loss functions automatically ignore -100?

I’m learning the tutorial Token classification.
And I have a question:
In preprocessing part, it converted the label of special tokens and the tokens that are not the first token in the word to -100.
It said that this will show its usefulness in loss function, but the code hasn’t use something like ignore_index in loss function. The -100 seems only work in metric evaluation.
So I wonder if loss function in Trainer will automatically ignore -100?
Or it works in num_labels, id2label and label2id parameter? Does it mean that I set these parameters, other labels will be ignored?

I have the same question for GPT2LMHeadModel.

Edit: -100 is the default ignore_index in PyTorch’s CrossEntropyLoss. So, any token with a label of -100 will be ignored in loss computation.

1 Like