Hello,
In chapter 7 of the course, I am going through the Text Summarization example.
There is section where they use DataCollatorForSeq2Seq before training and provide the following explanation for it.
Next, we need to define a data collator for our sequence-to-sequence task.
Since mT5 is an encoder-decoder Transformer model, one subtlety with
preparing our batches is that during decoding we need to shift the labels
to the right by one. This is required to ensure that the decoder only sees
the previous ground truth labels and not the current or future ones, which
would be easy for the model to memorize. This is similar to how masked
self-attention is applied to the inputs in a task like [causal language modeling](https://huggingface.co/course/chapter7/6).
Luckily, 🤗 Transformers provides a `DataCollatorForSeq2Seq` collator that
will dynamically pad the inputs and the labels for us. To instantiate this
collator, we simply need to provide the `tokenizer` and `model`
I am having trouble understanding the explanation. Could anyone help? Perhaps with an example?
Thank you