I’m fine-tuning GPT-2 on a language modeling task. Given a sequence, I’d like to compute the per-token loss for each token (instead of the averaged loss over the sequence). How can I do it?

How to compute per-token loss when doing language modeling?

ahans1 August 23, 2023, 4:00pm 4

Thanks for your reply @chrisociepa. Yes, I was able to work it with reduction='none'

Topic		Replies	Views
Is there a way to get per word loss instead of the average loss for GPT model 🤗Transformers	0	333	March 7, 2022
Alternative Language Modeling Loss Calculation 🤗Transformers	0	83	September 25, 2024
Can we get per word loss from the output of a GPT model Beginners	0	367	March 2, 2022
Loss computed for single token in GPT-2 Intermediate	0	331	April 12, 2023
Having troubel in understanding what loss is currently in use Beginners	1	764	November 24, 2023