Yet another question about T5 prefixes: are they special?

BramVanroy · May 28, 2021, 6:37am

Hi there

I want to do a multilingual token regression task where the result differs depending on an external language involved. So basically I want to model properties between languages by only having access to the original source text. For instance, if I have an English source text and I want to predict a value for each token as a means to quantify how that token relates to French, then that should be different from when it relates to German.

T5 seems like a good candidate here since it was already pretrained on the translation task with prefixes like "translation French to German: ", so this should already contain a lot of information. I have some questions about this:

Are these prefixes “special characters” that are not tokenised by the tokenizer (like <s> etc.) or can they be any string, which is then tokenised? Is there anything “special” about the prefix?
If there is nothing special about the string, can I assume that if I change the prefix from translate French to German to French to German, that part of the pretrained model is still taken into account (since it “recognises” the languages at the start), or would they really need to be in the same position?
~~If they have to be in the same position, can I just use them as they were pretrained “translate French to German” and simply add a regression head instead of LM head?~~ see edit below

Thanks a lot for your time! I really need to dig further in the models that were introduced after RoBERTa but, you know how it goes, life got in the way. So it’s nice that there’s a place here to ask some questions!

EDIT: of course T5 is text-to-text so I should not add a specific regression head. I’ll have to dig deeper how you evaluate on a regression task then, though. It seems very counter-intuitive to evaluate a regression model as a generation model. So if you have more information on using T5 for token regression, that’s welcome as well.

Topic		Replies	Views
Does task specific prefix matters for T5 fine-tuning? Beginners	9	7304	June 28, 2021
How to get all prefixes for T5? 🤗Transformers	0	191	April 26, 2023
Finetuning T5 for a task Intermediate	21	6961	September 3, 2022
What is loss function for T5 Models	13	12910	February 25, 2024
About Transformer task prefix Beginners	0	834	May 4, 2021

Yet another question about T5 prefixes: are they special?

Related topics