NER pipeline aggregation for BILOU

jambo · December 1, 2021, 11:14am

The simple aggregation strategy for an NER pipeline

nlp = pipeline("ner", model=model_directory, aggregation_strategy="simple")

aggregates correctly if we use BIO tags, but not if using BILOU style, is there a way to amend this easily?

I can change

nlp.model.config.id2label = {k: v.replace('L-', 'I-').replace('U-', 'B-') for k, v in nlp.model.config.id2label.items()}

but is there an in-built way to handle such cases where we have non BIO format labels?

HMuys · December 4, 2021, 2:25pm

the simpliest way if found is to adapt the config.json and adapt the ‘id2label’ dictionary to map to IOB
“id2label”: {
“0”: “O”,
“1”: “B-DISORDER”,
“2”: “B-DISORDER”,
“3”: “I-DISORDER”,
“4”: “I-DISORDER”,
“5”: “B-FINDING”,
“6”: “B-FINDING”,
“7”: “I-FINDING”,
“8”: “I-FINDING”
},
Hope this helps,

Herman

Topic		Replies	Views
NER tag , aggregation stratergy 🤗Tokenizers	2	7230	February 1, 2022
Support for BILOU tags in aggregation_strategy Beginners	0	85	May 29, 2024
Nested named entity recognition Intermediate	2	618	March 19, 2024
Bug? Pipeline is discarding some of the predictions 🤗Transformers	0	89	March 26, 2024
NER model fine tuning with labeled spans Beginners	5	3930	May 7, 2023

NER pipeline aggregation for BILOU

Related topics