I have tried to build sentence-pooling by bert provided by hugging face
from transformers import BertModel, BertTokenizer model_name = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(model_name) # load model = BertModel.from_pretrained(model_name) input_text = "Here is some text to encode" # tokenizer-> token_id input_ids = tokenizer.encode(input_text, add_special_tokens=True) # input_ids: [101, 2182, 2003, 2070, 3793, 2000, 4372, 16044, 102] input_ids = torch.tensor([input_ids]) with torch.no_grad(): last_hidden_states = model(input_ids) # Models outputs are now tuples last_hidden_states = last_hidden_states.mean(1) print(last_hidden_states) # size of last_hidden_states is [1,768]
Now I want to know what does this vector refers to in dictionary.
So how can I get the matrix in embedding whose size is [sequence_length,embedding_length], and then do the last_hidden_states @ matrix to find the word this vector refers to in dictionary?
Please help me.