NER pipeline aggregation for BILOU

The simple aggregation strategy for an NER pipeline

nlp = pipeline("ner", model=model_directory, aggregation_strategy="simple")

aggregates correctly if we use BIO tags, but not if using BILOU style, is there a way to amend this easily?

I can change

nlp.model.config.id2label = {k: v.replace('L-', 'I-').replace('U-', 'B-') for k, v in nlp.model.config.id2label.items()}

but is there an in-built way to handle such cases where we have non BIO format labels?

1 Like

the simpliest way if found is to adapt the config.json and adapt the ‘id2label’ dictionary to map to IOB
“id2label”: {
“0”: “O”,
“1”: “B-DISORDER”,
“2”: “B-DISORDER”,
“3”: “I-DISORDER”,
“4”: “I-DISORDER”,
“5”: “B-FINDING”,
“6”: “B-FINDING”,
“7”: “I-FINDING”,
“8”: “I-FINDING”
},
Hope this helps,

Herman