Generation Probabilities: How to compute probabilities of output scores for GPT2

Looks like the pull request is here: https://github.com/huggingface/transformers/pull/14654 and is implemented in transformers v4.16.0

Can you please explain the scores returned in generate in details. In particular, when we use a batch_size > 1.
Why applying argmax() on scores does not give the same thing as in sequences ?
With batch_size > 1, why the scores shape is not (batch_size, beam_nums, vocab_len) instead of (batch_size*beam_nums, vocab_len). It is really so confused.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-small")
pad_index = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
unk_index = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)
eos_index = tokenizer.convert_tokens_to_ids(tokenizer.eos_token)

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

model.resize_token_embeddings(len(tokenizer))
model.to("cuda")

# sequences
seq1 = "summarize: I am confused! I am confused"
seq2 = "summarize: why generate does not work with batch_size >1"

# encoding input and attention mask
encoding = tokenizer(
    [seq1, seq2],
    padding="longest",
    max_length=128,
    truncation=True,
    return_tensors="pt",
)

input_ids, attention_mask = encoding.input_ids.to("cuda"), encoding.attention_mask.to("cuda")
output = model.generate(input_ids,
                     max_length=64,
                     early_stopping=False, # to get len(scores) = sequences max_length
                     num_beams=4,
                     do_sample=False,
                     output_scores=True,
                     no_repeat_ngram_size=4,
                     return_dict_in_generate=True,
                     num_return_sequences=1)
output.sequences
tokenizer.batch_decode(output.sequences, skip_special_tokens=True)

# output.sequences
output.sequences
# tensor([[    0,    27,   183, 11319,    55,     1,     0,     0,     0,     0,
#              0,     0,     0,     0],
#         [    0,  3806,   405,    59,   161,    28, 11587,   834,  7991,  2490,
#            536,     3,     5,     1]], device='cuda:0')

# How to get the above indices using output.scores ??
2 Likes

@patrickvonplaten

Could you elaborate on how you chose gen_probs.prod(-1) as your method of obtaining an unique probability per sequence? Why not use gen_probs.mean(-1) for the average probability score per sequence?