How to get embedding matrix of bert in hugging face

nielsr · September 26, 2021, 9:03am

Actually, that’s not possible, unless you compute cosine similarity between the mean of the last hidden state and the embedding vectors of each token in BERT’s vocabulary. You can do that easily using sklearn.

The embedding matrix of BERT can be obtained as follows:

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-uncased")
embedding_matrix = model.embeddings.word_embeddings.weight

However, I’m not sure it is useful to compare the vector of an entire sentence with each of the rows of the embedding matrix, as the sentence vector is a “summary” of the entire sentence.

Topic		Replies	Views
What should be used as sentence embedding for BertModel? Beginners	0	1908	May 24, 2021
Question about Bert padding part when calcualting similarity matrix Beginners	2	688	May 13, 2022
How to get [CLS] embeddings from BertForTokenClassification model Beginners	3	15130	November 27, 2023
Saving Manually Resized Embeddings for a Pretrained Bert Model (I believe I am asking this correctly) Beginners	0	105	November 7, 2024
Question about last_hidden_state of the bert model Beginners	0	330	December 7, 2023

How to get embedding matrix of bert in hugging face

Related topics