Token Classification as Pre-training task

Hi all,

I have a huge NER corpus (refer CrossNER paper). I want to pre-train BERT on this dataset and further fine-tune this NER-BERT instead of MLM+NSP BERT on custom NER datasets.

Any suggestions on how to do that?

Thanks,
Nitesh