I have been reading the documentation for the T5 model and in the training section, for both unsupervised denoising and supervised training, the comment states the model is able to create the correct decoder_input_ids
. However, I’m not sure how the model is able to do. The documentation tells me that if the decoder_input_ids
is None (which it is by default) it takes the values of input_ids
. Does this mean the start token is the first token of the input sequence?
The description also mentions a start-sequence token. But I can’t seem to find where this is generated or what the token is.