Text Classification tokenizer problems on inference

I just finned tunned a BERT model for text classification (NER), it works great!

However, I found some nuances associated with the tokenizer, i.e., when I use my model for inference, the resultant entities come with “#” due to the subword tokenizer I suppose.

In order to fix this, I used the "aggregation_strategy" parameter. This parameter can receive 5 different values: none, simple, first, average and max. When I use “None” or “simple” the code runs well but I still get the result as subwords with “#”. If i try to use " first, average or max" options, I get the following error:

 word_entities.append(self.aggregate_word(word_group, aggregation_strategy))
  Python\Python39\lib\site-packages\transformers\pipelines\token_classification.py", line 336, in aggregate_word
    word = self.tokenizer.convert_tokens_to_string([entity["word"] for entity in entities])
TypeError: 'NoneType' object is not iterable

Any idea how to fix this?

thank you