I am confused about two observations regarding beam search:
From reading @patrickvonplaten How to generate text: using different decoding methods for language generation with Transformers](https://How to generate text: using different decoding methods for language generation with Transformers), it is my understanding that each beam in
BeamSearchEncoderDecoderOutputshould begin with a different token, or am I wrong in that assertion?
The documentation for
sequencesparameter states that “The second dimension (sequence_length) is either equal to
max_lengthor shorter if all batches finished early due to the
eos_token_id.”. In my observations it’s always been longer than
max_length. How come?