Working with named entities with bert

pritam · August 29, 2020, 2:26pm

I am working on a project where I have some named entities and I know their locations in the text. I want to use separate embedding vectors for those names. How do I modify the tokenizer and how do I add learnable vectors to the embedding matrix? Any suggestion or guide will be highly appreciated.

Edit:
After looking into some documentations I found(here) that tokenizer.add_special_tokens is the thing I was looking for. However after model.resize_token_embeddings(len(tokenizer)) is there any way to make only these new embeddings trainable?

rgwatwormhill · August 30, 2020, 11:48am

[I am not an expert]

I think the answer is No (unless you want to write some very detailed code).

Freezing is done on a per-layer basis, so either all the embeddings are trainable or all of them are not.

I’m not sure it would even make sense to try to write code to alter only your new embeddings. The whole Bert only works in context.

Are you sure you have enough data to train Bert to make useful embeddings of your new names?

pritam · August 30, 2020, 1:25pm

Thanks for your reply. I have just started working with BERT and trying out different things. I am not sure if there is enough data to create new embeddings, but I shall try it anyway.

Topic		Replies	Views
Process to adding new tokens to a corpus and subsequently training the corresponding word embeddings Beginners	0	3763	April 21, 2021
How to handle "entities" during tokenization? Beginners	1	245	March 10, 2021
Embeddings of added words Intermediate	1	746	September 9, 2022
Do you have to use a model card's accompanying tokenizer? Beginners	1	307	November 4, 2022
Question for Input of BERT Beginners	2	304	December 15, 2020

Working with named entities with bert

Related topics