NER tag , aggregation stratergy

I’m trying to do NER tagging, I have been using the pipeline to predict the output of my models,

issue: aggregation stratergy=" simple" does a good job but the tags are grouped. How can I avoid the tags being grouped? I want tags like I-PER, not PER, on the other hand, I tried the aggregation strategy “none”, here the tags are generated the way I want but words are split due to tokenization.

@sgugger can you please help me out with this?

1 Like

I have tried, the first and max options as well, but of no use

Maybe I am not understanding your question correctly, but aggregation_strategy is used to group the entities in the predictions. So removing aggregation_stategy should give you the BIO (ES) tags individually.

Take a look at the code:

Pipeline Token Classification

use aggregation_strategy instead. Whether or not to group the tokens corresponding to the same entity together in the predictions or not.
aggregation_strategy (str, optional, defaults to "none"):
The strategy to fuse (or not) tokens based on the model prediction.

If you meant something different please let me know.

1 Like