Hello everyone, I’m a new member of huggingface community, and here is my first topic!
Recently I’m trying to generate music scores by GPT-2, the length of the tokenized measures are not the same, so I must pad them to the same length. The course said I should use -100 as the padding token in the labels. But once I use -100, the loss becomes NaN and the accuracy starts to going down. But if I use 0 for padding token(as the input), there will be no such a problem. Why this happened?