Apply BertForTokenClassification on partially labeled input

I have a corpus where only some tokens (some adjective + noun paurs) are labeled for a specific task.

How do I use the BertForTokenClassification class to only calculate my updates on those labeled tokens and not on every single token? If I am correct I can’t unmask them in the attention_mask because that would lead to the context being ignored. I also don’t see a way where just labeling every unlabeled token as 0 for example makes sense, since not all adjective + noun pairs are labeled.

Thanks for any help in advance.