Question on HuggingFace's T5 documenation

Seungjun · May 18, 2023, 9:00am

I got a few questions on how T5 is trained reading this HuggingFace’s T5 doc.

I think maybe this is not true statement?

“T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format”

Isn’t unsupervised(filling out masked token) for unsupervised(self-supervised) training and supervised(ex Summarize: …) is for fine-tuning?

Did I understood following thing correctly?

So this is for pre-training
“The input of the encoder is the corrupted sentence, the input of the decoder is the original sentence and the target is then the dropped out tokens delimited by their sentinel tokens.”

and this is for fine-tuning
“It is trained using teacher forcing. This means that for training, we always need an input sequence and a corresponding target sequence. The input sequence is fed to the model using input_ids . The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the decoder_input_ids.”

Thanks

Topic		Replies	Views
Train T5 decoder only on a different language Models	0	449	March 16, 2021
T5 masking - spans of text tokens or encoded tokens? Beginners	0	825	August 12, 2021
How is T5 pretrained? 🤗Transformers	3	510	July 12, 2021
Prepare data to fine-tune T5 model on unsupervised objective 🤗Transformers	2	3928	November 3, 2021
Train T5 from scratch 🤗Transformers	4	3539	April 26, 2024

Question on HuggingFace's T5 documenation

Related topics