Extracting token embeddings from pretrained language models

Thank you very much for your great help.

I understood how to get the values for each token. But there is one thing I am confused. I have written all the codes you kindly wrote here, and for the sentence “this is a test”, when I print the len(data[0]), I get the value “6” instead of 4.

I have attached my codes and outputs as a screenshot. Do you know what could be wrong?

1 Like