I am a newbie in using transformers and never used it before.
I want to know, what’s the difference between T5Model and T5forConditionalGeneration? Where are they used?

Hi @ashiishkarhade
T5Model contains the encoder (stack of encoder layers) and decoder (stack of decoder layers) without any task specific heads. It returns the raw hidden states of the decoder as output.

T5ForConditionalGeneration also contains the encoder and decoder and adds an additional linear layer (lm_head) which takes the final hidden states of decoder and generates the next token.

For fine-tuning the model for seq2seq generation you should use T5ForConditionalGeneration, if you want to add some different task specific head then you can T5Model.

And almost all library models have this structure, a base model which returns raw hidden states and additional models with task specific heads(ForSequenceClassification, ForQuestionAnswering etc) on top of the base model.


Thank you for the brief .