Showing individual token and corresponding score during beam search

monmanuela · February 19, 2021, 7:46pm

Hello,

I am using beam search with a pre-trained T5 model for summarization. I would like to visualize the beam search process by showing the tokens with the highest scores, and eventually the chosen beam like this diagram:

(Taken from How to generate text: using different decoding methods for language generation with Transformers)

I am unsure how I can show the tokens and their corresponding scores.

I followed the discussion [Announcement] GenerationOutputs: Scores, Attentions and Hidden States now available as outputs to generate and Add flags to return scores, hidden states and / or attention weights in GenerationMixin by SBrandeis · Pull Request #9150 · huggingface/transformers · GitHub.

Following the docs, upon calling generate, I have set return_dict_in_generate=True, output_scores=True

generated_outputs = model_t5summary.generate(
  input_ids=input_ids.to(device),
  attention_mask=features['attention_mask'].to(device),
  max_length=input_ids.shape[-1] + 2,
  return_dict_in_generate=True,
  output_scores=True,
  output_hidden_states=True,
  output_attentions=True,
  no_repeat_ngram_size=2,
  early_stopping=True,
  num_return_sequences=3,
  num_beams=5,
)

Now I have an instance of BeamSearchEncoderDecoderOutput.

If I understand the docs (Utilities for Generation) correctly, scores will provide me with what I want but I am unsure on how to use the scores.

Any help/pointers from the community would be greatly appreciated, thank you

monmanuela · February 21, 2021, 7:25am

Tagging @patrickvonplaten reposted my question from Github, thanks for directing me to the forum

patrickvonplaten · February 24, 2021, 9:39am

Hey @monmanuela,

Good question!

So in the case of beam_search, the scores correspond to the log probability of all words + the log probability all previous scores in your beam.

So regarding the image this means that scores[0][0] will correspond to the log probabilities of all possible words in the vocabulary, so assuming your vocab would only consist of dog, nice, car and the probs are the same as in the diagram, the values would correspond to log(0.4), log(0.5), log(0.1)

Scores[1][0] then corresponds to the chosen word of time step one (e.g. dog) and all possible values again, se:
log(0.4) + log(0.05), log(0.4) + log(0.05), log(0.4) + log(0.9) using the diagram above again.

monmanuela · March 7, 2021, 5:49am

Hi @patrickvonplaten, thanks for your detailed answer! I have managed to manually visualize the beam search process thanks to your help

aspriter · June 20, 2021, 11:07pm

Hi, @patrickvonplaten
can you add more details? based on your example,
yes, I verified that

torch.sum(torch.exp(aa.scores[0][0])) == 1

but for the second token:

torch.sum(torch.exp(aa.scores[1][0] - aa.scores[0][0].max())) != 1

which is wrong, right?

sp-ananth · November 28, 2023, 4:11am

Hi, I have the same findings as @aspriter . Do we know what scores[0][1] actually corresponds to?
torch.sum(torch.exp(scores[0][0])) is ~1
torch.sum(torch.exp(scores[0][1])) is also ~1.

len(scores[0]) seems to be the number of beam searches

Topic		Replies	Views
T5 transformer tokens and scores Beginners	0	707	July 26, 2022
[Announcement] GenerationOutputs: Scores, Attentions and Hidden States now available as outputs to generate 🤗Transformers	1	4602	January 13, 2021
Generation scores Beginners	0	605	April 24, 2023
Retrieving Probability Over Tokens During Beam Search 🤗Transformers	0	542	August 3, 2022
Beam_search and generate are not consistent 🤗Transformers	0	496	May 10, 2022

Showing individual token and corresponding score during beam search

Related topics