T5forConditionalGeneration + classification

manzar · December 10, 2020, 11:03pm

I would like to do sequence classification over the encoder in parallel with conditional generation using an auxiliary loss. However, I am confused about which hidden state I should take for the classification.
Supposing that the hidden state of the last layer has the following dimensions: [batch size, seq length, hidden size] should I take the last one [:, -1, :] ?

sgugger · December 11, 2020, 12:44pm

It depends on the model. BERT uses the first one (where the CLS token is), some models use a pooling of all hidden states, other the one for the last logits (which is not necessarily -1 since you could have padding). I’d look at what is done in T5ForSequenceClassification and copy the code.

manzar · December 11, 2020, 3:20pm

You are absolutely right. That’s what I tried to do at first. However, the T5 model has no T5ForSequenceClassification class or something similar. I think that the most suitable is to use the last logits taking into account padding. Do you have in mind any function that can be helpful?

manzar · December 13, 2020, 2:48pm

I would also like to ask if there is any way to tie the weights of the encoder and the decoder.

Topic		Replies	Views
T5 sequence classification 🤗Transformers	1	917	May 8, 2022
T5forConditionalGeneration Beginners	2	2235	September 15, 2020
How to use T5 for sentence embedding? Research	6	16004	May 27, 2023
T5 Model, T5 Encoder Model and T5 Model for Conditional Generation Beginners	1	1294	November 20, 2022
Using T5 encoder with classification head Models	1	1868	July 17, 2022

T5forConditionalGeneration + classification

Related topics