EncoderDecoderModel Generation with Specified EOS Token


I have been working with the EncoderDecoderModel (with bert-base-chinese) for Seq2Seq generation. I have noticed that during output generation, if I were to explicitly define the EOS token, like below:


then following message “Setting pad_token_id to eos_token_id:102 for open-end generation.” will be printed. Furthermore, I have noticed that my overall generated sequence will be longer than if I were to ignore (not use) the “eos_token_id” argument.

I am wondering:

  1. What is the message about? Specifically, what is open-end generation?
  2. What might be some reasons that setting the “eos_token_id” will cause an increase in the generated sequence length? I would think that by explicitly denoting the EOS and terminating the generation process thereafter, the generated query should be shorter.

Thanks in advance.