Does task specific prefix matters for T5 fine-tuning?

If I understand correctly pre-trained T5 models were pre-trained with an unsupervised objective without any task specific prefix like “translate”, “summarize”, etc. Is it important then to create my summarization dataset for fine-tuning in a way that every input starts with "summarize: "?

I think it is important, but am not totally certain why. You could test it pretty easily I bet.

Actually, it may not be very important.
I am confused about the prefix too and ask on the Github. And one of the code author replied. (there are more questions, you may benefit from it too as a beginner)
https://github.com/huggingface/transformers/issues/6007

I think the question is if the released pre-trained models were trained on a mixture of supervised and unsupervised tasks (so they learned with prefixes and putting them in front of my input when fine-tuning is useful) or only on unsupervised tasks. I know they do it both ways in the paper but not clear which one was released.

1 Like

Use prefix if your task is similar to one of the pre-trained tasks, otherwise it really doesn’t matter (at least I have observed this in my own experiments)

4 Likes

Released models were pret-rained on both supervised and unsupervised tasks so prefix should help. See https://github.com/google-research/text-to-text-transfer-transformer/issues/314#issuecomment-665764712

2 Likes

Yes, that’s what I said, if task is similar to one of the tasks used in the pre-training mixture then it helps, if it’s completely different then it won’t matter

Yes, that part was clear. I just wanted to clarify how the released models were trained. Although you are right, the answer was implicit in what you already said.

Great part is that T5 performs really well with and without prefix :smile: Here’s what I observed in my experiments.,

  1. It converges slightly faster when using a task prefix and when the task was similar, say summarization
  2. Performed equally well even without prefix, took slightly longer to converge
4 Likes

My question is related, how can I understand how prefix and prefix masking is working? where can I read the code related to processing the prefix? or there is nothing special about that? I mean it is simply adding a task in the config file? is the idea as simple as that?

1 Like