Hi,
I’m writing a function that calculates log[\mathbb{P}_{model}(target\mid input)], which is the sequences_scores
in the beam search output, for any input
and target
. I tried in the following ways:
- Adapt the codes of
greedy_search()
so that at each step it takes a token oftarget
asnext_tokens
. The result is the sum of log probs fromoutput.scores
divided by len(target)^{length\_penalty}. - Adapt the codes of
beam_search()
to accpetnum_beams=1
, then uses thetarget
tokens asbeam_next_tokens
at each step.
But when I used the sequences returned by generate()
for a sanity check, I got differences of O(10^{-3}) in the results for both methods. Also, I noticed that the scores of beam search are not “deterministic” and are influenced by the results of other beams. As shown in the figure, 9/10 of the beams find the same sequence but yields different scores.
I would like to know what might be the subtleties in the beam search calculation? I’ve also read the codes of BeamSearchScorer
and BeamHypothesis
but couldn’t figure out the reason. Or, is there a correct way for the probability estimation?
Moreover, besides the normalisation by len(target)^{length\_penalty}, what are the other factors that can produce a mismatch of a sequence’s probabilities in beam_search
and greedy_search
?
Thanks in advance for your help.