As a debugging exercise, I am trying to compute the cross entropy loss from token probabilities of a causal model’s output to verify that it is equal to the output loss.
My code:
from transformers import AutoModelForCausalLM, AutoTokenizer
import math
model_name = "facebook/opt-350m"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer("Today is a nice day", return_tensors="pt").input_ids
outputs = model(input_ids, labels=input_ids)
probs = outputs.logits.softmax(-1)
ids = input_ids.tolist()[0]
# my computation of the cross entropy loss
l = 0
for i, id in enumerate(ids):
p = probs[0,i,id].item()
token = tokenizer.decode(id)
print(f'token: \'{token}\', p = {p}')
l -= math.log(p)
print("estimated loss = ",l/len(ids))
print("loss = ",outputs.loss.item())
Output:
token: '</s>', p = 0.0035762879997491837
token: 'Today', p = 1.0062270803246065e-06
token: ' is', p = 0.00011504700523801148
token: ' a', p = 0.00013640847464557737
token: ' nice', p = 0.0009689405560493469
token: ' day', p = 2.221063732577022e-05
estimated loss = 9.177834274679256
loss = 3.3444931507110596
The estimated loss, which is the average log probability (of the next token prediction), does not match the output loss. In fact, the probabilities themselves seem off (too small). What am I doing wrong? Thank you.