Flan-T5 / T5: what is the difference between AutoModelForSeq2SeqLM and T5ForConditionalGeneration

I’ve been playing around with the new Flan-T5 model and there seem to be different (contradictory?) pieces of information on how to run it.

The model card uses the following classes:

from transformers import T5Tokenizer, T5ForConditionalGeneration

The FLAN-T5 docs use these classes:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

At the same time the FLAN-T5 docs refer to the original T5 docs for advice on fine-tuning and there the other classes are used again:

from transformers import T5Tokenizer, T5ForConditionalGeneration

At the same time, the most comprehensive guide for fine-tuning (m)T5 is the summarization tutorial in the HF course, which uses the Seq2Seq class again:

from transformers import AutoModelForSeq2SeqLM

I understand the difference between the Auto… classes and the model specific T5 classes, but I’m not sure what the difference between ConditionalGeneration and Seq2Seq is, in terms of practical usage. My previous understanding was that the ConditionalGeneration classes are for GPT-like autoregressive/next-token-prediction models, while the Seq2Seq classes are for T5-like models (pre-trained with a masking objective). What’s confusion to me is that, both classes seem to work with versions of T5 (but not always with all versions).

=> Should one use AutoModelForSeq2SeqLM or T5ForConditionalGeneration for FLAN-T5?
What is the main difference between the two classes? (My use-case is specifically about fine-tuning FLAN-T5).

1 Like

@MoritzLaurer Any updates on this particular issue? Also can you share any reference tutorials/code/doc that will be helpful regarding this issue

Hi @FrozenWolf, no, I haven’t found an answer to this question unfortunately. The links in the question above are the main resources I found in this regard

1 Like

I’ve run into the same issue as well. Actually the most up-to-date tutorial about flan-t5 I believe is this: Fine-tune FLAN-T5 for chat & dialogue summarization by @philschmid, maybe he can tell us the differences?

2 Likes

Hi @MoritzLaurer

Thanks for the issue and for your message. If I understood correctly the problem here is whether to use T5ForConditionalGeneration or AutoModelForSeq2SeqLM for flan-t5 or t5 in general.
One simple check that you can do is to run the script below:

from transformers import AutoModelForSeq2SeqLM, T5ForConditionalGeneration

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
print(model.__class__.__name__)

model = T5ForConditionalGeneration.from_pretrained("t5-small")
print(model.__class__.__name__)

And you will observe the output:

T5ForConditionalGeneration
T5ForConditionalGeneration

AutoModelForSeq2SeqLM enables to load the correct seq2seq class given a checkpoint. The automapping class will retrieve the correct class from this list. Therefore there is no practical difference between xxxForConditionalGeneration classes and AutoModelForSeq2SeqLM, since they are the same.
For decoder-based models (e.g. GPT2), one should use xxxForCausalLM classes (or AutoModelForCausalLM.

Flan-t5 is not a new architecture itself, it is a series of t5 models fine-tuned in a different manner than T5. Therefore you can use T5ForConditionalGeneration or AutoModelForSeq2SeqLM.

12 Likes

Great, thanks for the response, good to know that it’s effectively the same (although that seems a bit confusing)