If I understand correctly pre-trained T5 models were pre-trained with an unsupervised objective without any task specific prefix like “translate”, “summarize”, etc. Is it important then to create my summarization dataset for fine-tuning in a way that every input starts with "summarize: "?
I think it is important, but am not totally certain why. You could test it pretty easily I bet.
Actually, it may not be very important.
I am confused about the prefix too and ask on the Github. And one of the code author replied. (there are more questions, you may benefit from it too as a beginner)
I think the question is if the released pre-trained models were trained on a mixture of supervised and unsupervised tasks (so they learned with prefixes and putting them in front of my input when fine-tuning is useful) or only on unsupervised tasks. I know they do it both ways in the paper but not clear which one was released.
Use prefix if your task is similar to one of the pre-trained tasks, otherwise it really doesn’t matter (at least I have observed this in my own experiments)
Released models were pret-rained on both supervised and unsupervised tasks so prefix should help. See https://github.com/google-research/text-to-text-transfer-transformer/issues/314#issuecomment-665764712
Yes, that’s what I said, if task is similar to one of the tasks used in the pre-training mixture then it helps, if it’s completely different then it won’t matter
Yes, that part was clear. I just wanted to clarify how the released models were trained. Although you are right, the answer was implicit in what you already said.
Great part is that T5 performs really well with and without prefix Here’s what I observed in my experiments.,
- It converges slightly faster when using a task prefix and when the task was similar, say summarization
- Performed equally well even without prefix, took slightly longer to converge