I have custom trained a distilBERT NER model for skill extraction from Job Descriptions.
The model works decently for a single word entities, but fails when there are two words in an entity.
For example:
testcase_1 = “The candidate needs to be good at python and machine learning”
In this, the model extracts two entities → 1. python 2. machine
testcase_2 = “The candidate needs to be good at python and machinelearning”
In this case, the model extracts → 1.python 2.machinelearning
How do I deal with this issue ? My train data has examples of “machine learning” i.e., with the space.
Also, I have tried various aggregation strategies(simple/first/average/max) and the results are same.
Thanks !