Since you are getting an index out of range error, there indeed seems to be a mismatch between the ground-truth labels and the prediction layer of the model. Since you are working on a language classification task, this is probably caused by the tokenizer, which means you are facing both of the problems I mentioned.
Unfortunately, I have no experience with custom tokenizers but if the model architecture needs the special tokens, for example BERT will always need [MASK]
for MLM and [SEP]
for NSP, I believe you will have to include them in the vocabulary of your newly trained tokenizer.