Does task specific prefix matters for T5 fine-tuning?

marton-avrios · July 28, 2020, 4:57pm

If I understand correctly pre-trained T5 models were pre-trained with an unsupervised objective without any task specific prefix like “translate”, “summarize”, etc. Is it important then to create my summarization dataset for fine-tuning in a way that every input starts with "summarize: "?

sshleifer · July 28, 2020, 7:27pm

I think it is important, but am not totally certain why. You could test it pretty easily I bet.

mengyahu · July 29, 2020, 2:20am

Actually, it may not be very important.
I am confused about the prefix too and ask on the Github. And one of the code author replied. (there are more questions, you may benefit from it too as a beginner)
https://github.com/huggingface/transformers/issues/6007

marton-avrios · July 29, 2020, 5:44am

I think the question is if the released pre-trained models were trained on a mixture of supervised and unsupervised tasks (so they learned with prefixes and putting them in front of my input when fine-tuning is useful) or only on unsupervised tasks. I know they do it both ways in the paper but not clear which one was released.

valhalla · July 29, 2020, 6:07am

Use prefix if your task is similar to one of the pre-trained tasks, otherwise it really doesn’t matter (at least I have observed this in my own experiments)

marton-avrios · July 29, 2020, 4:30pm

Released models were pret-rained on both supervised and unsupervised tasks so prefix should help. See https://github.com/google-research/text-to-text-transfer-transformer/issues/314#issuecomment-665764712

valhalla · July 29, 2020, 4:32pm

Yes, that’s what I said, if task is similar to one of the tasks used in the pre-training mixture then it helps, if it’s completely different then it won’t matter

marton-avrios · July 29, 2020, 4:35pm

Yes, that part was clear. I just wanted to clarify how the released models were trained. Although you are right, the answer was implicit in what you already said.

valhalla · July 29, 2020, 4:40pm

Great part is that T5 performs really well with and without prefix Here’s what I observed in my experiments.,

It converges slightly faster when using a task prefix and when the task was similar, say summarization
Performed equally well even without prefix, took slightly longer to converge

Arij · June 28, 2021, 7:10am

My question is related, how can I understand how prefix and prefix masking is working? where can I read the code related to processing the prefix? or there is nothing special about that? I mean it is simply adding a task in the config file? is the idea as simple as that?

Topic		Replies	Views
Finetuning T5 for a task Intermediate	21	6991	September 3, 2022
About Transformer task prefix Beginners	0	834	May 4, 2021
How to get all prefixes for T5? 🤗Transformers	0	192	April 26, 2023
Yet another question about T5 prefixes: are they special? Models	0	981	May 28, 2021
Can T5 "forget" Appendix D tasks after fine-tuning? Beginners	0	313	September 11, 2021

Does task specific prefix matters for T5 fine-tuning?

Related topics