Is there a way to return the "decoder_input_ids" from "tokenizer.prepare_seq2seq_batch"?

Yes, we should manually replace pad token with -100 in labels.

Ideally yes, it should start with bos token, but in the original fairseq implementation the models are trained with <eos> <bos> X .... , so we have kept it like that for reproducibility.