Can we get per word loss from the output of a GPT model


I would like to finetune a gpt2 model using a custom loss function, that will return zero loss for all but the last token in a sentence. However, the loss from the output of the model seems to be the averaged loss. Is it possible to get a per word loss.

Thanks in advance for any help.