The inputs into BERT are token IDs. How do we get the corresponding input token VECTORS?

BramVanroy · November 2, 2021, 8:36am

The token ID specifically is used in the embedding layer, which you can see as a matrix with as row indices all possible token IDs (so one row for each item in the total vocabulary size, for instance 30K rows). Every token therefore has a (learned!) representation. Be ware though, that this is not the same as word2vec or similar approaches - it is context-sensitive and not trained specifically to used by itself. It only serves as the the input of the model, together with potentially other embeddings like type and position embeddings. Getting those embeddings by themselves is not very useful. If you want to get output representations for each word, this post may be helpful. Generate raw word embeddings using transformer models like BERT for downstream process - #2 by BramVanroy

Topic		Replies	Views
Using vectors instead of input_ids in BERT Models	4	998	September 14, 2021
What's the input of BERT? Beginners	4	9708	December 23, 2022
How to get a model's initial input representation? 🤗Transformers	2	831	June 21, 2022
How to get embedding matrix of bert in hugging face Beginners	8	41102	October 31, 2024
Generate raw word embeddings using transformer models like BERT for downstream process Beginners	9	39971	October 4, 2021

The inputs into BERT are token IDs. How do we get the corresponding input token VECTORS?

Related topics