What is the difference between logits and scores?

The documentation in the link above makes me believe that scores is just a “processed” version of logits. This begs the question: how exactly are these logits processed?

I took a sample of these scores myself, and they look no different in meaning to logits. They don’t look like probabilities since some of them are clearly negative or above 1.0.

tensor([[-7.5898, -5.9922, 18.5625,  ..., -8.4844, -4.7539, -4.6758]],
                             device='cuda:0'))

Can someone please explain to me what these scores really are?

The links documentation is just the data class. I think the method of processing would vary based on where this class is used. I would guess in many cases the logits would just be softmaxed into a probability distribution.

1 Like

I believe all the processing functions, including top-k and temperature operations, can be found in the file transformers/generation/logits_process.py. They are called by class GenerationMixin.

Scores are equal to logits in case of greedy decoding, but are different in case of more fancy decoding methods like contrastive decoding (in that case, the scores of the selected tokens are the logits after contrastive penalties and re-ranking have been applied).

The way logits are processed into scores depends on the decoding algorithm (see this for an overview).

1 Like

It seems they are the same (using the greedy)

inpu_text = "What would happen if I eat Kharboze and honey together?"

inputs = tokenizer(inpu_text, return_tensors="pt").to(model.device)

gen_utilities = {
    "return_dict_in_generate": True,
    "output_scores": True,
    "output_logits": True,
    # "output_hidden_states": True,
    # "output_attentions": True,
}

gen_sampling = {
    "greedy": {"do_sample": False, "temperature": 1.0, "top_k": None, "top_p": None},
    "beam": {"num_beams": 3},
    "top_k": {"do_sample": True, "top_k": 50, "temperature": 1.0},
    "top_p": {"do_sample": True, "top_p": 0.8, "temperature": 0.7},
    "typical": {"typical_p": 0.95, "temperature": 1.0},
    "contrastive": {"penalty_alpha": 0.6, "top_k": 50},
}

max_new_tokens = 20

gen_out = model.generate(**inputs, **gen_utilities, **gen_sampling["greedy"], max_new_tokens=20)

scores = gen_out.scores
logits = gen_out.logits

for i in range(max_new_tokens):
    print(torch.equal(logits[i], scores[i]))

Output:

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True