I am a newbie in using transformers and never used it before.
I want to know, what’s the difference between T5Model and T5forConditionalGeneration? Where are they used?
T5Model contains the
encoder (stack of encoder layers) and
decoder (stack of decoder layers) without any task specific heads. It returns the raw hidden states of the decoder as output.
T5ForConditionalGeneration also contains the
decoder and adds an additional linear layer (lm_head) which takes the final hidden states of
decoder and generates the next token.
For fine-tuning the model for seq2seq generation you should use
T5ForConditionalGeneration, if you want to add some different task specific head then you can
And almost all library models have this structure, a base model which returns raw hidden states and additional models with task specific heads(
ForQuestionAnswering etc) on top of the base model.
Thank you for the brief .