Adding new Entities to Flair NER models


I hope it’s not inapproriate to ask question about Flair here. I noticed that Flair models were also hosted on the model hub and I could not find the answer to my question anywhere else.
I have a NER problem that I need to tackle and there is a nearly perfect existing model. The problem is however, that it lacks a few entities that I need.
My question is, can I add new entities to said existing model, or do I need to train from scratch with the same corpus the original authors used, plus additional training data for my additional entitites?

The flair documentation mentioneds training models and even continuation of training for existing models but it doesn’t mention whether new entities could be added.

Any help would be much appreciated!


Hi @neuralpat, I’ve never tried this but I wonder whether you could fine-tune the existing NER model on a small corpus composed of a mix of the original annotation and the new ones you’d like to extend with (I think the mix is needed so the original model doesn’t forget the original entities)?

Alternatively you could try just fine-tuning on the new annotations (perhaps without dropout or weight decay) and then compare the two approaches

Thanks for your reply!
How do I add the new entity types though. I’d need to change the head to have more classes right?
I also haven’t been able to find how exactly I need to feed the data into a model for token classification. There is an example in the docs but honestly I don’t get it.

1 Like

Ah yes you’re right - silly me! In that case I wonder whether knowledge distillation would be a viable approach? The idea being that you use the existing fine-tuned NER tagger as the teacher and then initialise a student with the same model type, but a new head to accomodate the new entities.

You’ll still need a corpus involving a mix of the original entities and the new ones, but it might be that with knowledge distillation you can get by with few samples than the original corpus that was used to train the teacher.

The docs on NER are indeed a bit confusing, but there’s a very nice tutorial by Sylvain Gugger that you can use as a template:

1 Like

Much appreciated @lewtun !

1 Like

Hey @lewtun, I have a follow-up.
You mentioned

Is it going to be a problem if the senteces in the original training set, already contain the new entitities, but obviously not annotated?
So if I just add new training data with those new entitites annotated, will that be able to outweigh those same entitites not being annotated in the original training data? (logic telle me know, but I thought I’d ask)

Do you know what I mean?

1 Like

If I understand correctly, you’d like to re-use the training set that was used to fine-tune the original NER model (i.e. the teacher), but now include the new annotations for the student?

To be honest, I don’t know how well the student will be able learn the new entities since this depends a lot on the number of samples, the relative weight of the knowledge distillation term in the loss etc.

PS I’m also not sure if knowledge distillation is the right approach here - I think your use case is an open research problem, so I’d suggest seeing what has been tackled in the literature before committing too many resources with my half-baked suggestion :slight_smile:

I actually wasn’t even talking about knowledge destillation but more general about a training set containing senteces where certain entitties are annotated and other’s where they are not. It seems obvious to me, that this woud hurt performance but I guess I wanted to make sure.

I’ve decided that I need to train from scratch and I’ll be trying to annotate the original training set, with the additional entitites I need. Many of them I can cover with regex, so this shouldn’t be a problem.
Since I could cover them with regex, why even put them into the model? I’d like to have one NER model that annotates everything, instead of having to go through multiple steps of annotation. Does that make sense?

I appreciate all your help!

You could try performing “model surgery”, adding one class to the classification layer but loading the original weights on the subset of the layer that’s pertinent.


Thank you for your input!

I thought of exchanging the entire head but what I’m struggling to understand s what exactly the process would need to be.
Say I load the model in pytorch directly, not through flair, and then switch out the last Linear() with a new one, would I just be able to train that model properly?
Don’t I need the optimzer as well in order to continue training?
Or is my understanding incorrect here and I can theoretically just load up any model and train it as if it was from scratch? (Now that I’m typing this out, I actually see no reason why that shouldn’t work :thinking:)

1 Like

ML in an experimental science so I think pretty much can work, you just need to experiment.

A nn.Layer has weights and biases that are Tensors so you can basically assign stuff to a slice of the tensor weights[:,:num_classes] = old_weights

1 Like

@neuralpat Hello i am new to a flair model. Can you please help me in knowing how can i add new entities to the ner on the flair model and also how can i create annotation based on the new entities. Your help will be much appreciated.

Hey @neuralpat, @lewtun
I am new to a flair model. Can you please help me in knowing how can i add new entities to the ner on the flair model and also how can i create annotation based on the new entities. Your help will be much appreciated.
Also i have unlabeled data of news articles which need to be labeled according to their entities using flair library.

Hey @krutik0204 you might be interested in checking out the zero-shot tutorial or posting your question on the flair repo: Issues · flairNLP/flair · GitHub