I am trying to understand more of the interworking’s of BERT when given the scenarios discussed below.
Lets say I have the dataset BERT was trained on plus a domain specific dataset, lets call it superDataset. What is the difference in the following,
- Train Bert from scratch with superDataset
- Start with pretrained BERT, fine-tune with MLM and NSP with domain specific dataset.
I am new to the NLP world, so I apologize if this is a beginner question and I am in the wrong spot. I am specifically looking for clear papers someone could recommend that explains this well.