Token probabilities don't agree with the output loss

As a debugging exercise, I am trying to compute the cross entropy loss from token probabilities of a causal model’s output to verify that it is equal to the output loss.

My code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import math

model_name = "facebook/opt-350m"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer("Today is a nice day", return_tensors="pt").input_ids
outputs = model(input_ids, labels=input_ids)
probs = outputs.logits.softmax(-1)
ids = input_ids.tolist()[0]

# my computation of the cross entropy loss
l = 0
for i, id in enumerate(ids):
  p = probs[0,i,id].item()
  token = tokenizer.decode(id)
  print(f'token: \'{token}\',   p = {p}')
  l -= math.log(p)

print("estimated loss = ",l/len(ids))
print("loss = ",outputs.loss.item())

Output:

token: '</s>',   p = 0.0035762879997491837
token: 'Today',   p = 1.0062270803246065e-06
token: ' is',   p = 0.00011504700523801148
token: ' a',   p = 0.00013640847464557737
token: ' nice',   p = 0.0009689405560493469
token: ' day',   p = 2.221063732577022e-05
estimated loss =  9.177834274679256
loss =  3.3444931507110596

The estimated loss, which is the average log probability (of the next token prediction), does not match the output loss. In fact, the probabilities themselves seem off (too small). What am I doing wrong? Thank you.

My mistake was to include the first input token id (which is not predicted by the model), thus incorrectly shifting the second index in probs tensor (when computing the loss). I corrected the code by replacing

ids = input_ids.tolist()[0]

with

ids = input_ids.tolist()[0][1:]

and now everything works correctly:

token: 'Today',   p = 0.0004240850103087723
token: ' is',   p = 0.11540017277002335
token: ' a',   p = 0.1325085610151291
token: ' nice',   p = 0.010343511588871479
token: ' day',   p = 0.8146189451217651
estimated loss =  3.344492952270118
loss =  3.3444931507110596