The meaning of 'decoder input ids' in encoder-decoder model

Hi. I’m a beginner. It might be a very basic question.
I’m studying T5 and its implementation.

I’m a little confused because an error occured during the inference.

Does ‘decoder_input_ids’ use labels only for training?

As far as I understand, decoder output token is created one by one through the attention mechanism between the ‘last output of the encoder’ and the output just before the decoder.

So, my question is

When training, it is understandable to force the labels to shift and put them into the decoder input.
Then, when I do the inference, I just need to put the input_ids and the attention mask in the model, isn’t it?
Why should I specify the ‘decoder_input_ids’ ?

“output of decoder in current step” = “next input of decoder” ?
or “labels corresponding to output of current step” = “next input of decoder”?
what is correct during inference?
am I missing something in implementation?

The questions may be too confusing, but I would appreciate it if anyone could help me understand it.

1 Like

From how I understand it, decoder_input_ids is the start of the sentence you forced it to be. The encoder-decoder model will continue the sentence from there.