I am confused about two observations regarding beam search:
-
From reading @patrickvonplaten How to generate text: using different decoding methods for language generation with Transformers](https://How to generate text: using different decoding methods for language generation with Transformers), it is my understanding that each beam in
BeamSearchEncoderDecoderOutput
should begin with a different token, or am I wrong in that assertion? -
The documentation for
BeamSearchEncoderDecoderOutput
for thesequences
parameter states that “The second dimension (sequence_length) is either equal tomax_length
or shorter if all batches finished early due to theeos_token_id
.”. In my observations it’s always been longer thanmax_length
. How come?
Thanks