Beam Search: Why do some beams begin with the same token?

I am confused about two observations regarding beam search:

  1. From reading @patrickvonplaten How to generate text: using different decoding methods for language generation with Transformers](https://How to generate text: using different decoding methods for language generation with Transformers), it is my understanding that each beam in BeamSearchEncoderDecoderOutput should begin with a different token, or am I wrong in that assertion?

  2. The documentation for BeamSearchEncoderDecoderOutput for the sequences parameter states that “The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id .”. In my observations it’s always been longer than max_length. How come?

Thanks :slight_smile: