I’ve been playing around with the new Flan-T5 model and there seem to be different (contradictory?) pieces of information on how to run it.
The model card uses the following classes:
from transformers import T5Tokenizer, T5ForConditionalGeneration
The FLAN-T5 docs use these classes:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
At the same time the FLAN-T5 docs refer to the original T5 docs for advice on fine-tuning and there the other classes are used again:
from transformers import T5Tokenizer, T5ForConditionalGeneration
At the same time, the most comprehensive guide for fine-tuning (m)T5 is the summarization tutorial in the HF course, which uses the Seq2Seq class again:
from transformers import AutoModelForSeq2SeqLM
I understand the difference between the Auto… classes and the model specific T5 classes, but I’m not sure what the difference between ConditionalGeneration and Seq2Seq is, in terms of practical usage. My previous understanding was that the ConditionalGeneration classes are for GPT-like autoregressive/next-token-prediction models, while the Seq2Seq classes are for T5-like models (pre-trained with a masking objective). What’s confusion to me is that, both classes seem to work with versions of T5 (but not always with all versions).
=> Should one use AutoModelForSeq2SeqLM
or T5ForConditionalGeneration
for FLAN-T5?
What is the main difference between the two classes? (My use-case is specifically about fine-tuning FLAN-T5).
3 Likes
@MoritzLaurer Any updates on this particular issue? Also can you share any reference tutorials/code/doc that will be helpful regarding this issue
Hi @FrozenWolf, no, I haven’t found an answer to this question unfortunately. The links in the question above are the main resources I found in this regard
1 Like
I’ve run into the same issue as well. Actually the most up-to-date tutorial about flan-t5 I believe is this: Fine-tune FLAN-T5 for chat & dialogue summarization by @philschmid, maybe he can tell us the differences?
2 Likes
Hi @MoritzLaurer
Thanks for the issue and for your message. If I understood correctly the problem here is whether to use T5ForConditionalGeneration
or AutoModelForSeq2SeqLM
for flan-t5 or t5 in general.
One simple check that you can do is to run the script below:
from transformers import AutoModelForSeq2SeqLM, T5ForConditionalGeneration
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
print(model.__class__.__name__)
model = T5ForConditionalGeneration.from_pretrained("t5-small")
print(model.__class__.__name__)
And you will observe the output:
T5ForConditionalGeneration
T5ForConditionalGeneration
AutoModelForSeq2SeqLM
enables to load the correct seq2seq
class given a checkpoint. The automapping class will retrieve the correct class from this list. Therefore there is no practical difference between xxxForConditionalGeneration
classes and AutoModelForSeq2SeqLM
, since they are the same.
For decoder-based models (e.g. GPT2), one should use xxxForCausalLM
classes (or AutoModelForCausalLM
.
Flan-t5 is not a new architecture itself, it is a series of t5 models fine-tuned in a different manner than T5
. Therefore you can use T5ForConditionalGeneration
or AutoModelForSeq2SeqLM
.
13 Likes
Great, thanks for the response, good to know that it’s effectively the same (although that seems a bit confusing)