Should the padding token be ignored in the loss function?

FeryET · August 24, 2021, 8:03am

Hi.

I’m trying to train a GPT2 model, and seeing the way the loss is computed, I don’t see the padding token or the eos token being ignored by the loss function. Why is that? Usually in RNNs and networks similar to that we ignore pads in the loss function since their backward route is not important for us, and we don’t want to waste resources, but reading the code, I think the thought behind this might be a bit different?

Thanks!

Topic		Replies	Views
Importance of ignoring special tokens in loss function Beginners	0	945	December 1, 2022
Using Padding for ASR models 🤗Transformers	0	325	December 16, 2022
Why my loss become NaN when I set the padding token in the labels to -100? Beginners	2	686	August 5, 2024
UdopForConditionalGeneration ignore_index in loss calculation 🤗Transformers	0	103	March 28, 2024
Expected workflow -100 and padding in labels in seq2seq? 🤗Transformers	0	745	December 12, 2022

Should the padding token be ignored in the loss function?

Related topics