NER: Treat whole sequence as one entity

Hey there,

I am facing a problem where I have strings which as a whole either represent a named entity or not. But I really struggle to find a model which does so or to come up with a way to force the usual NER models to not classify the substrings but the whole sequence.

For some context, the strings can be any language and I am mostly interested in deciding between person/corporation/group and non-NE. Being able to narrow the classes down to those would be a big plus. Any simple solutions? Until now I guess I have to built and train the NER top layer myself but I might be missing something obvious.

Thanks a lot in advance!

Hello :hugs:
Can you give an example input output for the mentioned model?

Hello Merve, please excuse my late reply, I am right in the process of moving :upside_down_face:

Here are two exanple input → output paits:

  1. “Alexander von Schönburg” → “person”
  2. “b. braun melsungen” → “company”
  3. “What a beautiful day” → ------------

So it is known that the whole sequence represents a named entity (or not) and I would like to have a model which makes use of that information – otherwise there is a chance that the model classifies the sub strings (1. “Alexander” → “person”, “Schönburg” → “location”; 2. “b. braun” → “person”, “melsungen” → “location”). It’s basically just a simple sequence classification task instead of classic token level NER.

Hope that makes it a bit clearer?

Hello :wave:

And your sequences will definitely be all entities? e.g. what if the below happens
Alexander von Schönburg is working at b. braun melsungen.
How should it be classified?

The expected input is definitely just one entity. So ideally your example (and my example #3) would be classified as non-NE or some other class which makes it clear to me that I should manually check those samples since they are probably “dirt” in the data.

I feel like you can just model this as a text classification task but I don’t think it will be successful, given my intuition is that NER models are a bit context-based or using look-ups. Can’t you just take a NER model like this one and get rid of spans and just return the outcome?

yeah, that was my initial thought too, but I then I gave tner (xlm-roberta) a shot for a fast and simple baseline and was baffled by how well it worked without any context. So I thought the obvious step to increase the performance was to make the task easier by using the knowledge about the input. Otherwise I need to figure out a way to handle the cases where multiple Entities are detected.

Can you elaborate on the last part of your last sentence? By spans you mean the information on where the entites begin and end? Yeah, that’s pretty much what I do right now and which works fine in many cases – but my problem is that there often are multiple entities identified in one sequence and I have to have just a single label.

Thank you for your effort and patience! :hugs: