What to use for the target input in the decoder for autoregressive usage

Encoder-decoder (seq2seq2) models like T5, BART and PEGASUS are trained using what is called “teacher forcing”, this just means supervised learning, i.e. the model needs to produce the target sentence given the source text.

Normally, if your dataset is diverse enough, it will also perform well at inference time, when using model.generate(). Using decoding techniques such as beam search and top-k sampling, you can get good results, even on unseen inputs.

1 Like