How to compute per-token loss when doing language modeling?

TinfoilHat · November 29, 2022, 7:40pm

I’m fine-tuning GPT-2 on a language modeling task. Given a sequence, I’d like to compute the per-token loss for each token (instead of the averaged loss over the sequence). How can I do it?

ahans1 · June 28, 2023, 7:05pm

Hey, it’s been some time since you posted this. Were you able to do it? If yes, would you be able to share code for it?

chrisociepa · August 23, 2023, 3:21pm

In pytorch, you can use CrossEntropyLoss function with reduction param set to none ( CrossEntropyLoss — PyTorch 2.0 documentation ).

You can find an example in a HF course: Training a causal language model from scratch - Hugging Face NLP Course (search for keytoken_weighted_loss to find it)

ahans1 · August 23, 2023, 4:00pm

Thanks for your reply @chrisociepa. Yes, I was able to work it with reduction='none'

Topic	Replies	Views
Is there a way to get per word loss instead of the average loss for GPT model 🤗Transformers	336	March 7, 2022
Can we get per word loss from the output of a GPT model Beginners	370	March 2, 2022
TFOpenAIGPTDoubleHeadsModel Loss Function Beginners	242	August 6, 2020
Alternative Language Modeling Loss Calculation 🤗Transformers	83	September 25, 2024
Loss computed for single token in GPT-2 Intermediate	335	April 12, 2023

How to compute per-token loss when doing language modeling?

Related topics