I’m trying to do NER tagging, I have been using the pipeline to predict the output of my models,
issue: aggregation stratergy=" simple" does a good job but the tags are grouped. How can I avoid the tags being grouped? I want tags like I-PER, not PER, on the other hand, I tried the aggregation strategy “none”, here the tags are generated the way I want but words are split due to tokenization.
@sgugger can you please help me out with this?
1 Like
I have tried, the first and max options as well, but of no use
Maybe I am not understanding your question correctly, but aggregation_strategy is used to group the entities in the predictions. So removing aggregation_stategy should give you the BIO (ES) tags individually.
Take a look at the code:
Pipeline Token Classification
use aggregation_strategy
instead. Whether or not to group the tokens corresponding to the same entity together in the predictions or not.
aggregation_strategy (str
, optional, defaults to "none"
):
The strategy to fuse (or not) tokens based on the model prediction.
If you meant something different please let me know.
1 Like