Hi,
I would like to finetune a gpt2 model using a custom loss function, that will return zero loss for all but the last token in a sentence. However, the loss from the output of the model seems to be the averaged loss. Is it possible to get a per word loss.
Thanks in advance for any help.