How to get embedding matrix of bert in hugging face

Betacat · September 24, 2021, 2:32am

I have tried to build sentence-pooling by bert provided by hugging face

from transformers import BertModel, BertTokenizer
model_name = 'bert-base-uncased'

tokenizer = BertTokenizer.from_pretrained(model_name)
# load
model = BertModel.from_pretrained(model_name)
input_text = "Here is some text to encode"
# tokenizer-> token_id
input_ids = tokenizer.encode(input_text, add_special_tokens=True)
# input_ids: [101, 2182, 2003, 2070, 3793, 2000, 4372, 16044, 102]
input_ids = torch.tensor([input_ids])

with torch.no_grad():
    last_hidden_states = model(input_ids)[0] # Models outputs are now tuples
last_hidden_states = last_hidden_states.mean(1)
print(last_hidden_states)
# size of last_hidden_states is [1,768]

Now I want to know what does this vector refers to in dictionary.
So how can I get the matrix in embedding whose size is [sequence_length,embedding_length], and then do the last_hidden_states @ matrix to find the word this vector refers to in dictionary?
Please help me.

nielsr · September 24, 2021, 7:42am

Hi,

The last_hidden_states are a tensor of shape (batch_size, sequence_length, hidden_size). In your example, the text “Here is some text to encode” gets tokenized into 9 tokens (the input_ids) - actually 7 but 2 special tokens are added, namely [CLS] at the start and [SEP] at the end. So the sequence length is 9. The batch size is 1, as we only forward a single sentence through the model. And the hidden_size of a BERT-base-sized model is 768. Hence, the last hidden states will have shape (1, 9, 768). You can then get the last hidden state vector of each token, e.g. if you want to get it for the first token, you would have to type last_hidden_states[:,0,:]. If you want to get it for the second token, then you have to type last_hidden_states[:,1,:], etc.

Also, the code example you refer to seems a bit outdated. Where did you get it from? We’ll update it.

Betacat · September 25, 2021, 8:57am

Really，really thanks for your help!
Actually I am a student from China and I get these codes at a chinese cooding net. You don’t need to update it
But I still have the question, actually I want to get the word that my last_hidden_state refer to. There are 7 words in input sentences. And I actually get the mean vector of them, so the size is [1,768]. I want to “decode” it to the word that it refers in dictionary.
Usually in bert, we first change words to one-hot code by dictionary provided and then we embed it and put the embedding sequence into encoder. I want to “de-embed” the tensor out of the bert, which is use this tensor class the transpose of embedding matrix. But how can I get the transpose of the matrix.
The second question is that, actually the document did not provide enough guide code to let us know the strcture of model(may be I am too weak).

nielsr · September 26, 2021, 9:03am

Actually, that’s not possible, unless you compute cosine similarity between the mean of the last hidden state and the embedding vectors of each token in BERT’s vocabulary. You can do that easily using sklearn.

The embedding matrix of BERT can be obtained as follows:

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-uncased")
embedding_matrix = model.embeddings.word_embeddings.weight

However, I’m not sure it is useful to compare the vector of an entire sentence with each of the rows of the embedding matrix, as the sentence vector is a “summary” of the entire sentence.

ShivaniSri · January 9, 2022, 3:17am

If I modify this embedding matrix then how to forward it to bert encoder layers

canovich · June 23, 2022, 6:08pm

Are these embeddings include position and segment embeddings? I mean are these embeddings acquired with summation of token embeddings, segment embeddings, and positional embeddings?
And, this embedding is embedding before entering the encoding layer. Am I right?
Thanks in advance.

nielsr · June 24, 2022, 9:25am

Hi,

These only include the token embeddings. The position embeddings and token type (segment) embeddings are contained in separate matrices.

And yes, the token, position and token type embeddings all get summed before being fed to the Transformer encoder.

canovich · June 24, 2022, 1:54pm

Thank you so much.

catmoez · October 31, 2024, 3:30pm

Thanks for this. How would we call the word labels/vocabulary list that matches these vectors?

Topic		Replies	Views
What should be used as sentence embedding for BertModel? Beginners	0	1909	May 24, 2021
Question about Bert padding part when calcualting similarity matrix Beginners	2	688	May 13, 2022
How to get [CLS] embeddings from BertForTokenClassification model Beginners	3	15155	November 27, 2023
Saving Manually Resized Embeddings for a Pretrained Bert Model (I believe I am asking this correctly) Beginners	0	107	November 7, 2024
Question about last_hidden_state of the bert model Beginners	0	331	December 7, 2023

How to get embedding matrix of bert in hugging face

Related topics