Using MLM and NSP to fine-tune BERT for question answering

skylernorgaard · October 11, 2022, 7:32pm

Hello,

I am testing different pre-trained BERT (SQUAD) models for question answering. For this example, I’ll use deepset/deberta-v3-large-squad2 for ease of explanation. I am realizing that domain specific training data would be helpful for improving performance (using data specific to my company). However, it is time-consuming and costly for us to annotate question answer pairs.

I know that the initial BERT model was trained in an unsupervised way with MLM and NSP. Also, from my understanding, most BERT models applied to specific problems (such as QA) feature one additional layer beyond the base architecture.

My idea is to fine-tune the base layers of the deepset/deberta-v3-large-squad2 model using the MLM and/or NSP approach, and then to re-attach the “QA” layer(s) in hopes of adapting the model to my domain, keeping the QA “knowledge” from the last layer and, ultimately, keeping the manually QA labeling to a minimum (since we do have a large amount of unlabeled text to work with).

I am a bit newer to NLP with BERT, so I am wondering if I am on the right track with this approach. If not, any suggestions for how to fine-tune the model with minimal annotate training data would be very helpful!

Topic		Replies	Views
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2287	February 6, 2021
Train from scratch vs further pretraining/fine tuning with MLM and NSP Research	1	1537	August 28, 2023
Fine-tuning BERT NSP with specific examples Beginners	0	785	January 7, 2022
Domain adaptation with MLM and NSP 🤗Transformers	3	1724	January 18, 2024
Continual pre-training from an initial checkpoint with MLM and NSP Models	4	4282	September 8, 2021

Using MLM and NSP to fine-tune BERT for question answering

Related topics