What is the purpose of DataCollatorForSeq2Seq when using EncodeDecoder architecture?


In chapter 7 of the course, I am going through the Text Summarization example.

There is section where they use DataCollatorForSeq2Seq before training and provide the following explanation for it.

Next, we need to define a data collator for our sequence-to-sequence task. 
Since mT5 is an encoder-decoder Transformer model, one subtlety with 
preparing our batches is that during decoding we need to shift the labels
 to the right by one.  This is required to ensure that the decoder only sees
the previous ground truth labels and not the current or future ones, which 
would be easy for the model to memorize. This is similar to how masked 
self-attention is applied to the inputs in a task like [causal language modeling](https://huggingface.co/course/chapter7/6).

Luckily, 🤗 Transformers provides a `DataCollatorForSeq2Seq` collator that 
will dynamically pad the inputs and the labels for us. To instantiate this 
collator, we simply need to provide the `tokenizer` and `model` 

I am having trouble understanding the explanation. Could anyone help? Perhaps with an example?

Thank you