Token probabilities don't agree with the output loss

underactuated · November 12, 2022, 7:05am

As a debugging exercise, I am trying to compute the cross entropy loss from token probabilities of a causal model’s output to verify that it is equal to the output loss.

My code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import math

model_name = "facebook/opt-350m"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer("Today is a nice day", return_tensors="pt").input_ids
outputs = model(input_ids, labels=input_ids)
probs = outputs.logits.softmax(-1)
ids = input_ids.tolist()[0]

# my computation of the cross entropy loss
l = 0
for i, id in enumerate(ids):
  p = probs[0,i,id].item()
  token = tokenizer.decode(id)
  print(f'token: \'{token}\',   p = {p}')
  l -= math.log(p)

print("estimated loss = ",l/len(ids))
print("loss = ",outputs.loss.item())

Output:

token: '</s>',   p = 0.0035762879997491837
token: 'Today',   p = 1.0062270803246065e-06
token: ' is',   p = 0.00011504700523801148
token: ' a',   p = 0.00013640847464557737
token: ' nice',   p = 0.0009689405560493469
token: ' day',   p = 2.221063732577022e-05
estimated loss =  9.177834274679256
loss =  3.3444931507110596

The estimated loss, which is the average log probability (of the next token prediction), does not match the output loss. In fact, the probabilities themselves seem off (too small). What am I doing wrong? Thank you.

underactuated · November 15, 2022, 1:32am

My mistake was to include the first input token id (which is not predicted by the model), thus incorrectly shifting the second index in probs tensor (when computing the loss). I corrected the code by replacing

ids = input_ids.tolist()[0]

with

ids = input_ids.tolist()[0][1:]

and now everything works correctly:

token: 'Today',   p = 0.0004240850103087723
token: ' is',   p = 0.11540017277002335
token: ' a',   p = 0.1325085610151291
token: ' nice',   p = 0.010343511588871479
token: ' day',   p = 0.8146189451217651
estimated loss =  3.344492952270118
loss =  3.3444931507110596

Topic		Replies	Views
Computing Log-Probabilities in Two Different Ways Beginners	0	527	November 29, 2023
How to compute per-token loss when doing language modeling? 🤗Transformers	3	3210	August 23, 2023
Computing log probability of an arbitrary sequence given another sequence Beginners	1	2055	April 10, 2024
Negative "cross entropy" loss function 🤗Transformers	0	1539	December 15, 2022
"The model did not return a loss from the inputs" Beginners	2	47	March 23, 2025

Token probabilities don't agree with the output loss

Related topics