Empty entity string when using TokenClassificationPipeline

Hello, I’ve got an issue with using the TokenClassificationPipeline. Essentially I’m getting empty strings as the entity classification for some tokens when I use any aggregation strategy other than, 'none'. When I use 'none' the model will pick the max value from the raw vectors as described (and will actually be correct), the issue with that is that I’m then getting all words broken into their constituent subwords. I was thinking of writing my own aggregator for the output to at least bring the subwords together but the model also does a good job at bringing multi word expressions together too so the ideal would be figuring out why the model is giving empty classification.

The model in general is very good when it gives a classification for a token so I’m just confused as to why when aggregating, it sometimes falls short, even when the words in question have no subwords.

Below is an example of the outputs

With 'first' aggregation:

[{'entity_group': 'Z',
  'score': 0.99955326,
  'word': 'we',
  'start': 0,
  'end': 2},
 {'entity_group': 'A',
  'score': 0.999987,
  'word': 'need',
  'start': 3,
  'end': 7},
 {'entity_group': 'Z',
  'score': 0.99968517,
  'word': 'some',
  'start': 8,
  'end': 12},
 {'entity_group': 'N',
  'score': 0.999977,
  'word': 'more',
  'start': 13,
  'end': 17},
 {'entity_group': '',
  'score': 0.99999094,
  'word': 'artificial',
  'start': 18,
  'end': 28},
 {'entity_group': 'F',
  'score': 0.99971086,
  'word': 'sweetener',
  'start': 29,
  'end': 38},
 {'entity_group': 'Z',
  'score': 0.9994038,
  'word': 'for our',
  'start': 39,
  'end': 46},
 {'entity_group': 'F',
  'score': 0.9999933,
  'word': 'coffee',
  'start': 47,
  'end': 53}]

With 'none' aggregation:

[{'entity': 'Z',
  'score': 0.99955326,
  'index': 1,
  'word': 'we',
  'start': 0,
  'end': 2},
 {'entity': 'A',
  'score': 0.999987,
  'index': 2,
  'word': 'need',
  'start': 3,
  'end': 7},
 {'entity': 'Z',
  'score': 0.99968517,
  'index': 3,
  'word': 'some',
  'start': 8,
  'end': 12},
 {'entity': 'N',
  'score': 0.999977,
  'index': 4,
  'word': 'more',
  'start': 13,
  'end': 17},
 {'entity': 'A',
  'score': 0.99999094,
  'index': 5,
  'word': 'artificial',
  'start': 18,
  'end': 28},
 {'entity': 'F',
  'score': 0.99968886,
  'index': 6,
  'word': 'sweet',
  'start': 29,
  'end': 34},
 {'entity': 'F',
  'score': 0.99971086,
  'index': 7,
  'word': '##ener',
  'start': 34,
  'end': 38},
 {'entity': 'Z',
  'score': 0.9998388,
  'index': 8,
  'word': 'for',
  'start': 39,
  'end': 42},
 {'entity': 'Z',
  'score': 0.9989687,
  'index': 9,
  'word': 'our',
  'start': 43,
  'end': 46},
 {'entity': 'F',
  'score': 0.9999933,
  'index': 10,
  'word': 'coffee',
  'start': 47,
  'end': 53}]

Hopefully this makes it more clear as to what is happening.

Any ideas and help would be greatly appreciated.

Thanks!

As a supplement to this, the model I finetuned was a distilbert model.