Train from scratch vs further pretraining/fine tuning with MLM and NSP

kasticrunch · May 9, 2023, 9:02pm

Hello all!

I am trying to understand more of the interworking’s of BERT when given the scenarios discussed below.

Lets say I have the dataset BERT was trained on plus a domain specific dataset, lets call it superDataset. What is the difference in the following,

Train Bert from scratch with superDataset
Start with pretrained BERT, fine-tune with MLM and NSP with domain specific dataset.

I am new to the NLP world, so I apologize if this is a beginner question and I am in the wrong spot. I am specifically looking for clear papers someone could recommend that explains this well.

Thanks everyone!

dkoterwa · August 28, 2023, 10:54am

Hi

First of all, do not apologize for asking questions, forum is specially designed for such purposes.
Training from scratch is often called pre-training and is designed to deliver some general lingustic “knowledge” to the model. It means that probably we would not like to pre-train the model with superDataset, because we need loads of data in order to pre-train LLM.

What we often do is to take the pre-trained LLM (such as BERT), which already has “seen” some general dependencies and relationships in the language, and then pass domain specific dataset. We adjust the weights of LLM, so we fine-tune the model to our needs.

What you have to also know is that MLM and NSP are generally pre-training task, we do not use them in the process of fine-tuning. There was some research about performing further pre-training on domain specific dataset to achieve higher performance during fine-tuning. If you are interested, you can have a look there

Topic		Replies	Views
Continual pre-training vs. Fine-tuning a language model with MLM 🤗Transformers	5	8687	November 30, 2021
Dataset for fake news detection, fine tune or pre-train Beginners	7	1731	October 12, 2020
Using MLM and NSP to fine-tune BERT for question answering Models	0	1171	October 11, 2022
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2295	February 6, 2021
Pre-Train BERT (from scratch) Research	43	18992	June 27, 2022

Train from scratch vs further pretraining/fine tuning with MLM and NSP

Related topics