Bertweet pooler_output for random individual words is almost identical...why?

thecity2 · March 12, 2021, 6:57pm

Out of curiosity I am looking at the pooler_output for individual words (like cat) using Bertweet:

model = AutoModel.from_pretrained("vinai/bertweet-base")
tokenizer = BertweetTokenizer.from_pretrained("vinai/bertweet-base", use_fast=True)
text = "cat"
normalized = tokenizer.normalizeTweet(text)
input_ids = torch.tensor([tokenizer.encode(normalized)])
outputs = model(input_ids)['pooler_output'].numpy().squeeze()

I did this for several words (“cat”, “dog”, “god”, “truck”, “rooster”) and plotted a sampling of the outputs (every 10th point):
download (3)

I was a bit surprised that the vectors would be so similar. I’m assuming that the way the model is built there is some “baseline” feature when only a single word is input and that is what is being seen here mostly. Is there a good explanation somewhere of this phenomenon?

Topic		Replies	Views
Generate raw word embeddings using transformer models like BERT for downstream process Beginners	9	39932	October 4, 2021
Comparing output of BERT model - why do two runs differ even with fixed seed? Beginners	2	649	January 18, 2022
Combine word embedding with visual features from ViT model Models	3	1090	November 11, 2022
Best way to use a BERT transformer on each sentence in a document? Beginners	0	452	July 13, 2021
Some weights of BertModel were not initialized from the model checkpoint 🤗Transformers	6	11381	January 15, 2024

Bertweet pooler_output for random individual words is almost identical...why?

Related topics