Extracting token embeddings from pretrained language models

kadaj13 · June 16, 2021, 10:27am

Thank you very much for your great help.

I understood how to get the values for each token. But there is one thing I am confused. I have written all the codes you kindly wrote here, and for the sentence “this is a test”, when I print the len(data[0]), I get the value “6” instead of 4.

I have attached my codes and outputs as a screenshot. Do you know what could be wrong?

Topic		Replies	Views
Extracting embedding values of NLP pertained models from tokenized strings 🤗Tokenizers	3	2221	August 18, 2021
Extracting sentence embeddings from NLP models from each layer seperately Beginners	0	718	August 18, 2021
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6271	July 14, 2021
Choosing the layer for extracting NLP features (using using pipeline) Models	0	768	August 19, 2021
Code is working fine for Bert and Roberta However Fails During GPTNeo Beginners	2	291	February 27, 2024

Extracting token embeddings from pretrained language models

Related topics