RoBERTa trained on NSP

I want to perform experiments with RoBERTa that has been trained on MLM+NSP task. In the paper, NSP was discarded because of lower performance and wasn’t made publicly available by the authors. Does anyone have good suggestions about if it is available in some form or a implementation that can replicate it in the same manner (with pre-training) ? I know transformers provide support but there’s not much room to make error due to restricted GPU access time, so if both model weights and implementation isn’t available, I’d really appreciate if someone can provide a working per-training routine with transformers.