How to get NER pipeline output to match with spacy's output?

Hi all,

In a prod setup, I am already using transformers, and need to have NER for a task. But the issue I’m facing is that unlike spacy, here the NER is at token level. What could be the quickest way or postprocessing to generate an output like spacy does (with char indices, at string level)?

And thanks for hosting this forum. Just like other discourse forums, we can now ask all the simple curiosity questions without treating it as an “issue” on github.

1 Like

hi @crazydiv
Thank you for joining the forum :slight_smile:

I think this issue is fixed in this PR

1 Like

Thanks. I was looking for the flag grouped_entities=True. It isn’t documented in task summary. It is there in source def but not explained in parameters, so I missed it.

nlp = pipeline("ner", grouped_entities=True)

Out of curiosity, you read about this in docs somewhere or happen to know this flag because you were following the related PR you referred?

Yes, it’s not documented yet. I was following the PR.

Feel free to open a PR and add it in docs :grin:

cc @sgugger