How to get 'sequences_scores' from 'scores' in 'generate()' method

Hi all

I was wondering if I can ask you some questions about how to use .generate() for BART or other pre-trained models. The example code is,

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
path = 'facebook/bart-large'
model = BartForConditionalGeneration.from_pretrained(path)
tokenizer = BartTokenizer.from_pretrained(path)

ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')

# Generate Summary
summary_ids = model.generate(
    num_beams=4, num_return_sequences=2, max_length=5, early_stopping=True,
    output_scores=True, return_dict_in_generate=True,
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False)
       for g in summary_ids['sequences']])

Then, the output is,

odict_keys(['sequences', 'sequences_scores', 'scores'])
tensor([[   2, 2387, 2387,  964,    2],
        [   2, 2387,    4,    4,    2]])
tensor([-0.8599, -0.9924])
torch.Size([4, 50265])
['MyMy friends', 'My..']

Do not worry about poor performance, [‘MyMy friends’, ‘My…’], since I am only trying to understand how this works. So, the question is,

  1. return_dict_in_generate=True returns ['sequences'], but together with output_scores=True, it returns ['sequences', 'sequences_scores', 'scores']. There are other arguments, like output_attentions or output_hidden_states. BART BartForConditionalGeneration documents do not explain anything about .generate(). So, I searched further and found Utilities for Generation (Utilities for Generation — transformers 4.5.0.dev0 documentation) that seems to talk about generating outputs using .generate() and Huggingface transformers model that seems to talk about the general methods of base classes, PreTrainedModel, but there is no document that shows what each variable, [‘sequences’, ‘sequences_scores’, ‘scores’], actually work or how they are computed. Where is the documents for this?
  2. Is sequences_scores computed as, \sum_{t} \log p(y_{t} | x, y_{t<})?
  3. How do you get sequences_scores from scores? My initial guess was to apply softmax on scores in dim=1, then get topk with k=1, but this does not give me very weird answer.
import torch

sm = torch.nn.functional.softmax(summary_ids['scores'][0], dim=1)
topk = sm.topk(k=1, dim=1)

which comes out as

tensor([[1.2851e-04, 8.8341e-12, 2.4085e-06,  ..., 3.9426e-12, 2.8815e-12,
        [1.9899e-05, 1.9899e-05, 1.9899e-05,  ..., 1.9899e-05, 1.9899e-05,
        [1.9899e-05, 1.9899e-05, 1.9899e-05,  ..., 1.9899e-05, 1.9899e-05,
        [1.9899e-05, 1.9899e-05, 1.9899e-05,  ..., 1.9899e-05, 1.9899e-05,
        [   0],
        [   0],
        [   0]]))
tensor([   2, 2387, 2387,  964,    2])

First token 2387 appears to be correct, but from the second, the probability is 1.9899e-05, which is just equivalent to 1/len(tokenizer). This seems to me that all the tokens are likely to be generated equally. So, How do you get sequences_scores from scores?
4. How do I get the probability of all the conditional probability of output tokens? For example, if .generate() gives output as [I, am, student], then how do I get the conditional probability of each token? [Pr(I | x), Pr(am | x, I), Pr(student | x, I, am)]. Initially, I thought it was ‘scores’, but I am not sure now.
5. Since I find it difficult to find documents/information on .generate() nor any information above, is this something that experienced researchers in NLP or programming would just be able to guess?

Thank you in advance


I am having the same problem as in 4. @patrickvonplaten , can you help us with this? Thanks!

1 Like

I am having the same issue and would like an answer if possible. I have looked through the code and the documentation, but there is not clarity on how to get various scores as asked in the original question.
@sgugger, sorry for tagging but I am not sure who else could help here.

1 Like

Hello @ktr0921
I am also trying to use sequence_scores and scores. but not sure how to interpret these?
for sequence scores, I am getting value like ranging from -0.000550318 to -0.027084468. How to get presentable score from these numbers? Can anybody help here?

I’m getting the same issue!
For the input "The carpenter talked to the librarian and asked ", I get the following sequences and sequences_scores:

{' about': tensor(-0.2684),
 ' her': tensor(-0.1288),
 ' him': tensor(-0.1826),
 ' if': tensor(-0.1445),
 ' what': tensor(-0.2695)}

I’m trying to understand why the scores are negative though…

1 Like


I think the sequences_scores here are the accumulated log probabilities, then normalized by the number of tokens on each beam cause they may have different number of tokens.

Check transformers/ at fe9152f67c61c9af4721fdc9abbc9578acf5f16f · huggingface/transformers · GitHub

cc @patrickvonplaten to confirm

1 Like

This example shows how the sequences_scores are computed from transition_scores :

Look at the reconstructed loss

# If you sum the generated tokens' scores and apply the length penalty, you'll get the sequence scores.
# Tip: recomputing the scores is only guaranteed to match with `normalize_logits=False`. Depending on the
# use case, you might want to recompute it with `normalize_logits=True`.
output_length = input_length + np.sum(transition_scores.numpy() < 0, axis=1)
length_penalty = model.generation_config.length_penalty
reconstructed_scores = transition_scores.sum(axis=1) / (output_length**length_penalty)
print(np.allclose(outputs.sequences_scores, reconstructed_scores))