The meaning of 'decoder input ids' in encoder-decoder model

macto · May 24, 2022, 5:27pm

Hi. I’m a beginner. It might be a very basic question.
I’m studying T5 and its implementation.

I’m a little confused because an error occured during the inference.

Does ‘decoder_input_ids’ use labels only for training?

As far as I understand, decoder output token is created one by one through the attention mechanism between the ‘last output of the encoder’ and the output just before the decoder.

So, my question is

When training, it is understandable to force the labels to shift and put them into the decoder input.
Then, when I do the inference, I just need to put the input_ids and the attention mask in the model, isn’t it?
Why should I specify the ‘decoder_input_ids’ ?

“output of decoder in current step” = “next input of decoder” ?
or “labels corresponding to output of current step” = “next input of decoder”?
what is correct during inference?
am I missing something in implementation?

The questions may be too confusing, but I would appreciate it if anyone could help me understand it.

kchu02 · July 29, 2022, 12:26am

From how I understand it, decoder_input_ids is the start of the sentence you forced it to be. The encoder-decoder model will continue the sentence from there.

Topic		Replies	Views
How does T5 create the correct decoder_input_ids? 🤗Transformers	2	2708	September 21, 2020
T5 models: About the decoder_input_ids argument Models	0	774	December 5, 2022
T5 fine tuning, loss difference when using labels and decoder_input_ids 🤗Transformers	2	1187	October 12, 2020
What is the correct form of decoder_input_ids for LEDForConditionalGeneration? 🤗Transformers	1	715	July 5, 2021
What should be shifted for decoder input for Bart Beginners	1	332	July 8, 2021

The meaning of 'decoder input ids' in encoder-decoder model

Related topics