I am working on a project where I have some named entities and I know their locations in the text. I want to use separate embedding vectors for those names. How do I modify the tokenizer and how do I add learnable vectors to the embedding matrix? Any suggestion or guide will be highly appreciated.
Edit:
After looking into some documentations I found(here) that tokenizer.add_special_tokens is the thing I was looking for. However after model.resize_token_embeddings(len(tokenizer)) is there any way to make only these new embeddings trainable?
Thanks for your reply. I have just started working with BERT and trying out different things. I am not sure if there is enough data to create new embeddings, but I shall try it anyway.