Word Specific Classification (custom token classification?)

My question might sound trivial, but I want to ensure I’m on the right track.

My task: I have a sentence with some target words, each having corresponding start-end indices and labels (3 labels in total).

I am approaching the problem by customizing the classic run_token_classification.py script. During data preprocessing, I set the labels of all tokens that are not part of a target word to -100. During training, the data is processed through DataCollatorForTokenClassification and passed to BertForTokenClassification. Intuitively, this should work because the loss is calculated only for the target words. Am I right?

I have also tried customizing the BERT model to extract an embedding (sum/mean of the last four hidden states of the target words) and use it for classification, with similar results.

My main question is: Is my approach correct? Is modifying the script in this way enough?