Hi,
I have been working with the EncoderDecoderModel (with bert-base-chinese) for Seq2Seq generation. I have noticed that during output generation, if I were to explicitly define the EOS token, like below:
then following message “Setting pad_token_id
to eos_token_id
:102 for open-end generation.” will be printed. Furthermore, I have noticed that my overall generated sequence will be longer than if I were to ignore (not use) the “eos_token_id” argument.
I am wondering:
- What is the message about? Specifically, what is open-end generation?
- What might be some reasons that setting the “eos_token_id” will cause an increase in the generated sequence length? I would think that by explicitly denoting the EOS and terminating the generation process thereafter, the generated query should be shorter.
Thanks in advance.