Fill-mask and classification at the same time

jmenke · March 18, 2022, 10:14am

Hi I was wondering whether one can easily train both masking and sentence classification at the same time.

So the goal is to use a pre-trained model and keep masking input for my domain-specific dataset, but at the same time use the Start Token to make prediction. I would combine the two losses and perform backpropagation.

lucadini · March 18, 2022, 10:36am

The original BERT architecture was trained both for Masked Language Modeling and Next Sentence Prediction tasks. The latter is basically sentence classification, so you can simply use the origina architecture. You can find it at modeling_bert.py, it’s called BertForNextSentencePrediction. You can change the classification task simply deciding to give the model different labels!

jmenke · March 18, 2022, 2:00pm

Thanks you very much for the answer.

Is there also an (easy) way to train pretrained models which were originally only trained on MLM to also include an additional prediction task.

lucadini · March 18, 2022, 2:46pm

I think the procedure is the same but you use pre-trained weights in model creation by calling

model = BertForNextSentencePrediction.from_pretrained(
model_args.model_name_or_path, […])

You can use the weights of the same model trained only for MLM, you simply get a warning that some weights are initialized from scratch (the weights of the classification layer), and that’s what you want!

EDIT: I’m assuming you’re using BERT, I don’t know if there is the same model structure for other models. You should look in the models folder here. Probabily there is a class named [Name of the model]ForSequenceClassification or something similar!
It’s possible that you have to mix the MLM and sequence classification heads tho!

jmenke · March 18, 2022, 3:29pm

Ah okay thank is what I was looking for. I am using Roberta which seems to not offer this option, if i looked it up correctly

Topic		Replies	Views
Is masking still used when finetuning a BERT model? Beginners	1	1321	July 29, 2020
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2286	February 6, 2021
Does BERT Use Two Segments of a Sequence When Predicting the Masks? Beginners	0	312	June 14, 2022
Continual pre-training from an initial checkpoint with MLM and NSP Models	4	4282	September 8, 2021
Continue pre-training Greek BERT with domain specific dataset 🤗Transformers	10	4656	January 4, 2023

Fill-mask and classification at the same time

Related topics