I am working on a project where I have some named entities and I know their locations in the text. I want to use separate embedding vectors for those names. How do I modify the tokenizer and how do I add learnable vectors to the embedding matrix? Any suggestion or guide will be highly appreciated.
After looking into some documentations I found(here) that
tokenizer.add_special_tokens is the thing I was looking for. However after
model.resize_token_embeddings(len(tokenizer)) is there any way to make only these new embeddings trainable?
[I am not an expert]
I think the answer is No (unless you want to write some very detailed code).
Freezing is done on a per-layer basis, so either all the embeddings are trainable or all of them are not.
I’m not sure it would even make sense to try to write code to alter only your new embeddings. The whole Bert only works in context.
Are you sure you have enough data to train Bert to make useful embeddings of your new names?
Thanks for your reply. I have just started working with BERT and trying out different things. I am not sure if there is enough data to create new embeddings, but I shall try it anyway.