Hi,

When trying to compute the log-probabilities of the next token in two different ways using a decoder-only transformer – that is, `model.__call__`

and `model.generate(..., output_scores=True)`

, they don’t seem to be equal. Here is a MWE:

```
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", device_map="cpu", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
prompt = "I don't know that"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_values = model.generate(
input_ids,
output_scores=True, return_dict_in_generate=True, do_sample=True,
max_new_tokens=1, min_new_tokens=1,
)
gen_seq = gen_values.sequences
gen_scores = gen_values.scores[0]
scores = model(gen_seq, labels=gen_seq).logits[:, -1, :]
print(gen_scores.shape)
print(scores.shape)
print((F.softmax(gen_scores, dim=-1) - F.softmax(scores, dim=-1)).abs().sum()) # gets 1-2 consistently
```

Could you please let me know if I’m making a mistake in computing the log-probabilities, and if so, where?