Loss computed for single token in GPT-2

In GPT-2, how do I obtain the loss that is calculated for individual tokens when I run a forward pass on the input sequence? I understand the loss we obtain for the entire sequence length, but interested in obtaining token-level loss. Any help is greatly appreciated.