How to find the beam search score for any target output? (BartForConditionalGeneration)


I’m writing a function that calculates log[\mathbb{P}_{model}(target\mid input)], which is the sequences_scores in the beam search output, for any input and target. I tried in the following ways:

  • Adapt the codes of greedy_search() so that at each step it takes a token of target as next_tokens. The result is the sum of log probs from output.scores divided by len(target)^{length\_penalty}.
  • Adapt the codes of beam_search() to accpet num_beams=1, then uses the target tokens as beam_next_tokens at each step.

But when I used the sequences returned by generate() for a sanity check, I got differences of O(10^{-3}) in the results for both methods. Also, I noticed that the scores of beam search are not “deterministic” and are influenced by the results of other beams. As shown in the figure, 9/10 of the beams find the same sequence but yields different scores.

I would like to know what might be the subtleties in the beam search calculation? I’ve also read the codes of BeamSearchScorer and BeamHypothesis but couldn’t figure out the reason. Or, is there a correct way for the probability estimation?

Moreover, besides the normalisation by len(target)^{length\_penalty}, what are the other factors that can produce a mismatch of a sequence’s probabilities in beam_search and greedy_search?

Thanks in advance for your help.

1 Like