What should decoder_input_ids be when pre-training mBART?

Philomath868 · June 18, 2025, 3:13pm

A simple question: when pre-training (denoising) with mBART-50, should decoder_input_ids be based upon labels (the correct text) or upon input_ids (the corrupted text)?

In some reference code I found, decoder_input_ids appears to be input_ids shifted right: batch[“decoder_input_ids”] = self.shift_tokens_right(batch[“input_ids”]).

However, from many other sources, it seems that decoder_input_ids is always based on labels, from which the model learns via cross-attention. (See also possible mistake in documentation · Issue #11357 · huggingface/transformers · GitHub where it is claimed that the mention of input_ids is a bug, as labels is actually used).

So which one is it? I would be immensely grateful for any clarification or guidance!

Topic		Replies	Views
What should be shifted for decoder input for Bart Beginners	1	339	July 8, 2021
Is there a way to return the "decoder_input_ids" from "tokenizer.prepare_seq2seq_batch"? 🤗Transformers	5	3366	December 29, 2020
The meaning of 'decoder input ids' in encoder-decoder model Beginners	1	2492	July 29, 2022
How does T5 create the correct decoder_input_ids? 🤗Transformers	2	2737	September 21, 2020
Training BART, error when preparing decoder_input_ids. Shape of input_ids? Beginners	3	1465	August 7, 2020

What should decoder_input_ids be when pre-training mBART?

Related topics