Calculate F1 score in a NER task with BERT

Hi everyone,
I fine tuned a BERT model to perform a NER task using a BILUO scheme and I have to calculate F1 score.
However, in named-entity recognition, f1 score is calculated per entity, not token.
Moreover, there is the Word-Piece “problem” and the BILUO format, so I should:

  1. aggregate the subwords in words
  2. remove the prefixes “B-”, “I-”, “L-” from each entity
  3. calculate the F1 score on the entity

Before I spend hours (if not days) to try to implement such code, I would like to know if an implemented solution already exists.
Thanks in advance :slight_smile:

You should use the datasets metric seqeval that will do all of this for you. Check the new run_ner script for an example.

1 Like

Thanks for the hint @sgugger .
I have a question.
seqeval of datasets is the same implementation of this?
If yes, in my case this is a problem, because F1 score with BILOU format can be calculated only in strict mode, while I need the default one.

Ah yes, it’s the same so it won’t be useful to your use case, sorry.