I have a trained RoBERTa model with a Byte Level BPE Encoding algorithm, which I want to benchmark on a custom NER dataset.
Each sample is as followed:
Text: John is playing football
Labels: B-PER O O O
The text could be run through the tokenizer to generate subword tokens. However, the total count of the subwords tokens may be different from the text tokens, and I don’t know how to align the labels accordingly