Ask for help with prediction results of Named Entity Recognition Task

I did get your point, in the case of using PhoBERT, it, unfortunately, does not have a tokenizer in a fast vers.

Therefore, I manually write a function of doing the thing mentioned

One way to handle this is to only train on the tag labels for the first subtoken of a split token. We can do this in :hugs: Transformers by setting the labels we wish to ignore to -100 . In the example above, if the label for @HuggingFace is 3 (indexing B-corporation ), we would set the labels of ['@', 'hugging', '##face'] to [3, -100, -100] .

 for i, label in tqdm(enumerate(examples["labels"]),total=len(examples["labels"])):
        steps=[] 
        batch=0
        for index,value in enumerate(examples['token'][i]):
            len_to_compare=len(tokenizer.tokenize(value))
            if len_to_compare>1:
                steps+=(list(range(index+batch+1,index+batch+len_to_compare)))
                batch+=(len_to_compare-1)

I just easily store the array of indexes that should be ignored by the above function, however, my result did get worse.

Screen Shot 2021-05-17 at 22.29.09