Flan-T5 / T5: what is the difference between AutoModelForSeq2SeqLM and T5ForConditionalGeneration

I’ve been playing around with the new Flan-T5 model and there seem to be different (contradictory?) pieces of information on how to run it.

The model card uses the following classes:

from transformers import T5Tokenizer, T5ForConditionalGeneration

The FLAN-T5 docs use these classes:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

At the same time the FLAN-T5 docs refer to the original T5 docs for advice on fine-tuning and there the other classes are used again:

from transformers import T5Tokenizer, T5ForConditionalGeneration

At the same time, the most comprehensive guide for fine-tuning (m)T5 is the summarization tutorial in the HF course, which uses the Seq2Seq class again:

from transformers import AutoModelForSeq2SeqLM

I understand the difference between the Auto… classes and the model specific T5 classes, but I’m not sure what the difference between ConditionalGeneration and Seq2Seq is, in terms of practical usage. My previous understanding was that the ConditionalGeneration classes are for GPT-like autoregressive/next-token-prediction models, while the Seq2Seq classes are for T5-like models (pre-trained with a masking objective). What’s confusion to me is that, both classes seem to work with versions of T5 (but not always with all versions).

=> Should one use AutoModelForSeq2SeqLM or T5ForConditionalGeneration for FLAN-T5?
What is the main difference between the two classes? (My use-case is specifically about fine-tuning FLAN-T5).