Why my loss become NaN when I set the padding token in the labels to -100?

YuechengZhou · August 5, 2022, 2:10pm

Hello everyone, I’m a new member of huggingface community, and here is my first topic!

Recently I’m trying to generate music scores by GPT-2, the length of the tokenized measures are not the same, so I must pad them to the same length. The course said I should use -100 as the padding token in the labels. But once I use -100, the loss becomes NaN and the accuracy starts to going down. But if I use 0 for padding token(as the input), there will be no such a problem. Why this happened?

YuechengZhou · August 5, 2022, 2:14pm

My model is TFGPT2LMHeadModel, and the loss function is tf.keras.losses.SparseCategoricalCrossentropy

heirtothedemon · August 5, 2024, 9:08am

Hi, were you able to figure this out? I’m facing the exact same issue in GPT2

Topic		Replies	Views
Seq2seq padding 🤗Transformers	1	69	October 10, 2024
-100 in predictions Beginners	1	54	December 20, 2024
Should the padding token be ignored in the loss function? 🤗Transformers	0	1275	August 24, 2021
Expected workflow -100 and padding in labels in seq2seq? 🤗Transformers	0	745	December 12, 2022
Confusion over use of -100 pad value for GPT2 Causal Modeling Fine-tuning Models	0	282	April 13, 2023

Why my loss become NaN when I set the padding token in the labels to -100?

Related topics