How to train BERT from scratch on a new domain for both MLM and NSP?

tlqnguyen · January 9, 2021, 7:35pm

I’m trying to train BERT model from scratch using my own dataset. I would like to train the model in a way that it has the exact architecture of the original BERT model.

In the original paper, it stated that: “BERT is trained on two tasks: predicting randomly masked tokens (MLM) and predicting whether two sentences follow each other (NSP). SCIBERT follows the same architecture as BERT but is instead pretrained on scientific text.”

I’m trying to understand how to train the model on two tasks as above. At the moment, I initizalied the model as below:

from transformers import BertForMaskedLM
model = BertForMaskedLM(config=config)

However, it would just be for MLM and not NSP. How can I initialize and train the model with NSP as well?

My assumptions would be either

Initialize with BertForPreTraining (for both MLM and NSP),
OR
After finish training with BertForMaskedLM, initalize the same model and train again with BertForNextSentencePrediction (but this approach’s computation and resources would cost twice…)

I’m not sure which one is the correct way. Or maybe my original approach was fine as it is?
Any insights or advice would be greatly appreciated.

valhalla · January 11, 2021, 6:33am

Hi @tlqnguyen

For MLM and NSP training, you should use the BertForPreTraining class. When you pass labels to the forward it will do MLM and when you pass next_sentence_label it’ll do NSP

tlqnguyen · February 6, 2021, 7:36am

Hi @valhalla,

Thank you so much for your suggestion. I have a quick follow up question on this. When we train with NSP, does the sentences in corpus need to be label with Sentence A or B? Or I can just train an unannotated corpus?

Topic		Replies	Views
BERT Next Sentence Prediction: How to do predictions? Beginners	5	7584	September 29, 2022
Pre-Train BERT from scratch 🤗Transformers	5	15512	May 30, 2023
Next sentence prediction on custom model 🤗Transformers	3	3392	May 14, 2024
Continual pre-training from an initial checkpoint with MLM and NSP Models	4	4293	September 8, 2021
Keep NSP head after BertForPretraining 🤗Transformers	1	344	February 1, 2022

How to train BERT from scratch on a new domain for both MLM and NSP?

Related topics