BertForTokenClassification with IOB2 Tagging

Dear members of this forum,

I am using BertForTokenClassification for named entity recognition. The labels are encoded using the beginning-inside-outside tagging format (IOB2 format to be precise). My overall setup works. However I am observing two things where I don’t know the proper solution to:

  1. In order to obtain the values associated with the target labels, the argmax function is applied on the logits returned by the model. However, sometimes it happens that the model predicts an “I” tag (e.g. I-LOC) after an “O” tag which is a violation of the format since a “B” tag (e.g. B-LOC) is expected first. Of a course I could interpret an “I” after an “O” as “B” or I can go for interpreting an “O” in front of an “I” as “B” and choose what performs better. However I wondered whether there a method (perhaps a modified argmax approach) where such a result cannot occur by construction.

  2. Sometimes I am observing “B” and “I” tags in areas where the attention mask is 0 which means I am having a prediction for an input token which does not exist. My approach was to completely ignore such cases. However I am wondering here as well whether there is a better strategy.

Thank you very much in advance.


I think I found a solution to the first problem I described. The option aggregation_strategy in
TokenClassificationPipeline lists all the possible options to deal with inconsistencies.