Why is there no pooler representation for XLNet or a consistent use of sequence_summary()?

facehugger2020 · November 30, 2020, 12:07am

I’m trying to create sentence embeddings using different Transformer models. I’ve created my own class where I pass in a Transformer model, and I want to call the model to get a sentence embedding.

Both BertModel and RobertaModel return a pooler output (the sentence embedding).

pooler_output ( torch.FloatTensor of shape (batch_size, hidden_size) ) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.

Why does XLNetModel not produce a similar pooler_output?

When I look at the source code for XLNetForSequenceClassification, I see that there actually exists code for getting a sentence embedding using a function called sequence_summary().

    def forward():
        transformer_outputs = self.transformer( ... )
        output = transformer_outputs[0]

        output = self.sequence_summary(output)

Why is this sequence_summary() function not used consistently in the other Transformers models, such as BertForSequenceClassification and RobertaForSequenceClassification?

Topic		Replies	Views
BertModel.forward() output caveat removed? Models	6	653	September 5, 2020
What should be used as sentence embedding for BertModel? Beginners	0	1918	May 24, 2021
Difference between CLS hidden state and pooled_output? Beginners	0	1557	March 28, 2022
Where to pick-up embedding data from BERT model? Models	2	886	February 8, 2022
Obtain output embeddings from summarization 🤗Transformers	0	414	April 16, 2021

Why is there no pooler representation for XLNet or a consistent use of sequence_summary()?

Related topics