Is there a way to get per word loss instead of the average loss for GPT model

  • Reposting as it seems to suit this subtopic better.


I would like to finetune a gpt2 model using a custom loss function, that will return zero loss for all but the last token in a sentence. However, the loss from the output of the model seems to be the averaged loss. Is it possible to get a per word loss.

Thanks in advance for any help.