How to get NER pipeline output to match with spacy's output?

crazydiv · July 12, 2020, 7:30am

Hi all,

In a prod setup, I am already using transformers, and need to have NER for a task. But the issue I’m facing is that unlike spacy, here the NER is at token level. What could be the quickest way or postprocessing to generate an output like spacy does (with char indices, at string level)?

And thanks for hosting this forum. Just like other discourse forums, we can now ask all the simple curiosity questions without treating it as an “issue” on github.

valhalla · July 12, 2020, 8:25am

hi @crazydiv
Thank you for joining the forum

I think this issue is fixed in this PR

crazydiv · July 12, 2020, 9:01am

Thanks. I was looking for the flag grouped_entities=True. It isn’t documented in task summary. It is there in source def but not explained in parameters, so I missed it.

nlp = pipeline("ner", grouped_entities=True)

Out of curiosity, you read about this in docs somewhere or happen to know this flag because you were following the related PR you referred?

valhalla · July 12, 2020, 9:51am

Yes, it’s not documented yet. I was following the PR.

Feel free to open a PR and add it in docs

cc @sgugger

Topic		Replies	Views
Unable to get NER tags from "ner" pipeline? Beginners	0	521	October 7, 2020
How to get string offsets from custom NER pipeline? 🤗Transformers	0	653	November 23, 2021
NER tag , aggregation stratergy 🤗Tokenizers	2	7180	February 1, 2022
Inconsistency in Model Output [ Token Classification] 🤗Transformers	0	333	April 12, 2023
Output of NER pipeline is in single quotes... difficult to transform it in JSON Beginners	0	251	January 24, 2023

How to get NER pipeline output to match with spacy's output?

Related topics