Using MLM and NSP to fine-tune BERT for question answering


I am testing different pre-trained BERT (SQUAD) models for question answering. For this example, I’ll use deepset/deberta-v3-large-squad2 for ease of explanation. I am realizing that domain specific training data would be helpful for improving performance (using data specific to my company). However, it is time-consuming and costly for us to annotate question answer pairs.

I know that the initial BERT model was trained in an unsupervised way with MLM and NSP. Also, from my understanding, most BERT models applied to specific problems (such as QA) feature one additional layer beyond the base architecture.

My idea is to fine-tune the base layers of the deepset/deberta-v3-large-squad2 model using the MLM and/or NSP approach, and then to re-attach the “QA” layer(s) in hopes of adapting the model to my domain, keeping the QA “knowledge” from the last layer and, ultimately, keeping the manually QA labeling to a minimum (since we do have a large amount of unlabeled text to work with).

I am a bit newer to NLP with BERT, so I am wondering if I am on the right track with this approach. If not, any suggestions for how to fine-tune the model with minimal annotate training data would be very helpful!