I’m trying to create sentence embeddings using different Transformer models. I’ve created my own class where I pass in a Transformer model, and I want to call the model to get a sentence embedding.
Both BertModel and RobertaModel return a pooler output (the sentence embedding).
pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
Why does XLNetModel not produce a similar pooler_output
?
When I look at the source code for XLNetForSequenceClassification, I see that there actually exists code for getting a sentence embedding using a function called sequence_summary()
.
def forward():
transformer_outputs = self.transformer( ... )
output = transformer_outputs[0]
output = self.sequence_summary(output)
Why is this sequence_summary()
function not used consistently in the other Transformers models, such as BertForSequenceClassification and RobertaForSequenceClassification?