What should be used as sentence embedding for BertModel?

I want to get sentences’ embedding vectors for other classification tasks

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
inputs = tokenizer('this is a test.', return_tensors="pt")
outputs = model(**inputs)

If I do this way:
embedding_of_sentence = outputs[1]

Here, according to the documentation, the outputs[1] is the:
* **pooler_output** ( torch.FloatTensorof shape(batch_size, hidden_size) ) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.

The last layer hidden state of the first token CLS of the sentence for classification, which seems right.

However, in another post, they are suggesting using “usually only take the hidden states of the [CLS] token of the last layer”,

and the code is:

embedding_of_last_layer = outputs[0][0]
embedding_of_sentence =embedding_of_last_layer[0]

The results from the two methods for the sentence embedding are different. Which one is right or better?

1 Like