Pre-Train BERT (from scratch)

BramVanroy · September 25, 2020, 2:51pm

Well as you found, RoBERTa showed that leaving out NSP yields better results on downstream tasks. Albert then re-added a similar (yet very different) task, namely sentence order prediction, which improved performance on downstream tasks.

PS: please don’t post multiple consecutive posts but rather edit your posts to add more information. It’s a bit annoying with the notifications.

Topic		Replies	Views
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2298	February 6, 2021
Pre-Train BERT from scratch 🤗Transformers	5	15452	May 30, 2023
Continual pre-training from an initial checkpoint with MLM and NSP Models	4	4286	September 8, 2021
Original Bert Pretraining Intermediate	0	546	January 10, 2022
BERT Next Sentence Prediction: How to do predictions? Beginners	5	7561	September 29, 2022

Pre-Train BERT (from scratch)

Related topics