Due to recent code changes by @sshleifer, I am trying to understand what is desired for BART’s input for training and generation, and whether the codebase is reflecting it properly as I’ve encountered some inconsistencies.
I am assuming both
tgt_ids are encoded with a BART tokenizer, and therefore have the format of [bos, token1, token2, …, eos].
Looking at transformers/examples/seq2seq/finetune.py#L151
decoder_input_ids = shift_tokens_right(tgt_ids) means that
eos will be the first token and
bos will be the second token.
This has an effect on generation:
- We need
- The first actually generated token (i.e. after
decoder_start_token_id) will be
The default value for
decoder_start_token_idis missing from
facebook/bart-large-mnli, which means it falls back to
bos. The other BART models have
decoder_start_token_id. Why is the difference? Looks to me that using finetune.py with
bart-large-mnliwill not have generation as intended.
In fairseq’s implementation the equivalent for
decoder_start_token_idis set to
bos: fairseq/models/bart/hub_interface.py#L123. Can you please explain why did you decide to use the format of
[eos, bos, token1, token2, ...]for
[bos, token1, token2, ...]?
Is there still need for
It was introduced in transformers/pull/6526 (new user, can’t add another link), when the first token of
bosand the second was the first regular token of the target sequence (transformers/examples/seq2seq/finetune.py#L144). Using it now shouldn’t have any effect, if I understand correctly (because a trained model will easily learn to always output
bosin this position anyway).