What is the purpose of DataCollatorForSeq2Seq when using EncodeDecoder architecture?

anumafzal94 · February 4, 2022, 11:04am

Hello,

In chapter 7 of the course, I am going through the Text Summarization example.

There is section where they use DataCollatorForSeq2Seq before training and provide the following explanation for it.

Next, we need to define a data collator for our sequence-to-sequence task. 
Since mT5 is an encoder-decoder Transformer model, one subtlety with 
preparing our batches is that during decoding we need to shift the labels
 to the right by one.  This is required to ensure that the decoder only sees
the previous ground truth labels and not the current or future ones, which 
would be easy for the model to memorize. This is similar to how masked 
self-attention is applied to the inputs in a task like [causal language modeling](https://huggingface.co/course/chapter7/6).

Luckily, 🤗 Transformers provides a `DataCollatorForSeq2Seq` collator that 
will dynamically pad the inputs and the labels for us. To instantiate this 
collator, we simply need to provide the `tokenizer` and `model`

I am having trouble understanding the explanation. Could anyone help? Perhaps with an example?

Thank you

Topic		Replies	Views
The difference between Seq2SeqDataset.collate_fn and Seq2SeqDataCollator._encode Beginners	2	1305	October 24, 2020
Data collation: cannot understand the logics of the API 🤗Transformers	0	26	September 2, 2024
Pad Tokens & Attention Masks with Data Collators 🤗Transformers	0	57	August 29, 2024
Key error: 0 in DataCollatorForSeq2Seq for BERT Beginners	10	3993	March 13, 2024
A question about the DataCollator for LM 🤗Tokenizers	2	360	May 6, 2024

What is the purpose of DataCollatorForSeq2Seq when using EncodeDecoder architecture?

Related topics