Hi @nbroad , interesting question.
What kind of Niche domain do you consider?
Since nowadays we have several hundreds (if not thousands) of NLP datasets, is it possible to find similar datasets for pre-MLM before final-MLM by your own data ?
I don’t have direct experience on sentence similarity training, but I once trained a classifier on multi-langauges Toxic-comment domain (maybe a bit niche) where finetuning with MLM did improve the performance compared to non-MLM.