Hi,
Due to recent code changes by @sshleifer, I am trying to understand what is desired for BART’s input for training and generation, and whether the codebase is reflecting it properly as I’ve encountered some inconsistencies.
I am assuming both src_ids
and tgt_ids
are encoded with a BART tokenizer, and therefore have the format of [bos, token1, token2, …, eos].
Looking at transformers/examples/seq2seq/finetune.py#L151
decoder_input_ids = shift_tokens_right(tgt_ids)
means that eos
will be the first token and bos
will be the second token.
This has an effect on generation:
- We need
decoder_start_token_id=eos_token_id
. - The first actually generated token (i.e. after
decoder_start_token_id
) will bebos
.
Questions:
-
The default value for
decoder_start_token_id
is missing fromfacebook/bart-base
andfacebook/bart-large-mnli
, which means it falls back tobos
. The other BART models haveeos
as theirdecoder_start_token_id
. Why is the difference? Looks to me that using finetune.py withbart-base
/bart-large-mnli
will not have generation as intended. -
In fairseq’s implementation the equivalent for
decoder_start_token_id
is set tobos
: fairseq/models/bart/hub_interface.py#L123. Can you please explain why did you decide to use the format of[eos, bos, token1, token2, ...]
fordecoder_input_ids
instead of[bos, token1, token2, ...]
? -
Is there still need for
force_bos_token_to_be_generated
?
It was introduced in transformers/pull/6526 (new user, can’t add another link), when the first token ofdecoder_input_ids
wasbos
and the second was the first regular token of the target sequence (transformers/examples/seq2seq/finetune.py#L144). Using it now shouldn’t have any effect, if I understand correctly (because a trained model will easily learn to always outputbos
in this position anyway).
Thanks!