Custom DistilBERT NER - Entity recognition issues for entities with spaces

bharadwajswarna · March 30, 2022, 5:38pm

I have custom trained a distilBERT NER model for skill extraction from Job Descriptions.

The model works decently for a single word entities, but fails when there are two words in an entity.

For example:
testcase_1 = “The candidate needs to be good at python and machine learning”
In this, the model extracts two entities → 1. python 2. machine

testcase_2 = “The candidate needs to be good at python and machinelearning”
In this case, the model extracts → 1.python 2.machinelearning

How do I deal with this issue ? My train data has examples of “machine learning” i.e., with the space.

Also, I have tried various aggregation strategies(simple/first/average/max) and the results are same.

Thanks !

Topic		Replies	Views
NER - aggregation_strategy Intermediate	1	1401	January 24, 2024
An extra space appears before the entities recognised with RoBERTa fine-tuned for Token Classification 🤗Transformers	0	158	November 8, 2023
TokenClassificationPipeline produce entities with "##" characters 🤗Transformers	6	25	May 19, 2025
How to fine tune bert on entity recognition? Beginners	23	7364	November 21, 2022
Custom NER with ~54 entities Community Calls	0	432	May 25, 2023

Custom DistilBERT NER - Entity recognition issues for entities with spaces

Related topics