Annotate a NER dataset (for BERT)

Calin · September 2, 2021, 9:47am

I am working on annotating a dataset for the purpose of named entity recognition.

In principle, I have seen that for multi-phrase (not single word) elements, annotations work like this (see this example below):

Romania ( B-CNT )
United States of America ( B-CNT C-CNT C-CNT C-CNT )

where B-CNT stands for “beginning-country” and C-CNT represents “continuing-country”.

The problem that I face is that I have a case in which (not related to countries) where I need to annotate like B-W GAP_WORD C-W C-W .

How should I proceed with the annotation in this case?

If I do annotate like in the schema above, should I expect a BERT -alike entity recognition system to learn and detect that a phrase can be like B-W GAP_WORD C-W C-W , or do I need that “C-W” (continuation word) to be exactly after the B-W (beginning word)?

Which solution is correct of the following 2:

B-W GAP_WORD C-W C-W
B-W GAP_WORD B-W C-W

And then, in case 2, find a way to make the connection between the B-Ws (actually corresponding to the same entity)?

TG1 · November 23, 2021, 9:55pm

Did you find a solution to this problem? I am working on this right now and want to label entities that are multiword. So far I have just labelled them all as individual words but its a pretty bad way to do this.

Calin · October 12, 2022, 9:03am

Hey TG1. I am really sorry to see your message only now. I went forward with the second approach and had decent results at that time.

new-in-town · May 29, 2024, 3:54pm

Hi @Calin, hello @TG1

it seems our problems are very similar, look:

What I want to achieve: to find a model recognizing these place:

   Syracuse, NY
   Athens, United States

as ONE entity. Well, Athens, United States ( B-CNT C-CNT C-CNT) would be fine too.

Very similar to your problem, isn’t it?

Topic		Replies	Views
How to fine tune bert on entity recognition? Beginners	23	7351	November 21, 2022
Named Entity Recognition: fine-tune or create new model? Beginners	3	3541	February 11, 2023
Custom Entity Extraction from text Beginners	2	2620	November 5, 2023
Seeking Advice on Named Entity Recognition with AI Beginners	6	642	February 5, 2025
Initialising BERT Model Models	0	276	June 9, 2023

Annotate a NER dataset (for BERT)

Related topics