Showing individual token and corresponding score during beam search

Hello,

I am using beam search with a pre-trained T5 model for summarization. I would like to visualize the beam search process by showing the tokens with the highest scores, and eventually the chosen beam like this diagram:


(Taken from How to generate text: using different decoding methods for language generation with Transformers)

I am unsure how I can show the tokens and their corresponding scores.

I followed the discussion [Announcement] GenerationOutputs: Scores, Attentions and Hidden States now available as outputs to generate and Add flags to return scores, hidden states and / or attention weights in GenerationMixin by SBrandeis · Pull Request #9150 · huggingface/transformers · GitHub.

Following the docs, upon calling generate, I have set return_dict_in_generate=True, output_scores=True

generated_outputs = model_t5summary.generate(
  input_ids=input_ids.to(device),
  attention_mask=features['attention_mask'].to(device),
  max_length=input_ids.shape[-1] + 2,
  return_dict_in_generate=True,
  output_scores=True,
  output_hidden_states=True,
  output_attentions=True,
  no_repeat_ngram_size=2,
  early_stopping=True,
  num_return_sequences=3,
  num_beams=5,
)

Now I have an instance of BeamSearchEncoderDecoderOutput.

If I understand the docs (Utilities for Generation) correctly, scores will provide me with what I want but I am unsure on how to use the scores.

Any help/pointers from the community would be greatly appreciated, thank you :pray:

1 Like

Tagging @patrickvonplaten reposted my question from Github, thanks for directing me to the forum

Hey @monmanuela,

Good question!

So in the case of beam_search, the scores correspond to the log probability of all words + the log probability all previous scores in your beam.

So regarding the image this means that scores[0][0] will correspond to the log probabilities of all possible words in the vocabulary, so assuming your vocab would only consist of dog, nice, car and the probs are the same as in the diagram, the values would correspond to log(0.4), log(0.5), log(0.1)

Scores[1][0] then corresponds to the chosen word of time step one (e.g. dog) and all possible values again, se:
log(0.4) + log(0.05), log(0.4) + log(0.05), log(0.4) + log(0.9) using the diagram above again.

1 Like

Hi @patrickvonplaten, thanks for your detailed answer! I have managed to manually visualize the beam search process thanks to your help :blush:

Hi, @patrickvonplaten
can you add more details? based on your example,
yes, I verified that

torch.sum(torch.exp(aa.scores[0][0])) == 1

but for the second token:

torch.sum(torch.exp(aa.scores[1][0] - aa.scores[0][0].max())) != 1

which is wrong, right?

1 Like

Hi, I have the same findings as @aspriter . Do we know what scores[0][1] actually corresponds to?
torch.sum(torch.exp(scores[0][0])) is ~1
torch.sum(torch.exp(scores[0][1])) is also ~1.

len(scores[0]) seems to be the number of beam searches