BERT Split NER Labeling

ethanwager01 · November 23, 2021, 3:04am

Building a custom label NER model for custom medical data. In my dataset, there are times where an entity may have non-entity words splitting it.

For a simple example, say I was designing data for to train for a person label that looked for first names. The sentence “His name was John Smith” would be O, O, O, B-PER, I-PER. That makes sense, but free-text gets messy. Imagine situations like this.

John, the man and legend, Smith, will be remembered forever.

Would Bert understand…

B-PER, O, O, O, O, I-PER, O O O O.

See how the split occurred? These should cause the same label, but I’m not sure if I should create IOB data as above, or have two separate instances of B-PER. The issue being. I want to model to understand that they are connected.

I’m playing with Bio-clinicalBert and it’s done well for ner. Just trying to get it to the next level.

Thanks in advanced, and I’d be happy to share more data if needed.

ethanwager01 · December 7, 2021, 3:30pm

Anyone?

Will keep playing on my own in the meantime

Topic		Replies	Views
BioBERT NER issue Beginners	7	4555	November 27, 2022
How to fine tune bert on entity recognition? Beginners	23	7361	November 21, 2022
Nested named entity recognition Intermediate	2	618	March 19, 2024
Doccano dataset for named entity recognition task using BERT Beginners	3	478	May 14, 2024
Create custom tags for fine-tuning Bert for NER task 🤗Datasets	0	882	January 22, 2024

BERT Split NER Labeling

Related topics