Next sentence prediction on custom model

Hey there @msamogh

I am facing a similar problem as yours: have you discovered something since the time you created this thread?
Also, if you know it, does this mean that models with architecture “BertForMaskedLM” have been trained ONLY on MLM, and not on NSP, and so I have to do that again?