Fill-mask and classification at the same time

Hi I was wondering whether one can easily train both masking and sentence classification at the same time.

So the goal is to use a pre-trained model and keep masking input for my domain-specific dataset, but at the same time use the Start Token to make prediction. I would combine the two losses and perform backpropagation.

The original BERT architecture was trained both for Masked Language Modeling and Next Sentence Prediction tasks. The latter is basically sentence classification, so you can simply use the origina architecture. You can find it at, it’s called BertForNextSentencePrediction. You can change the classification task simply deciding to give the model different labels!

Thanks you very much for the answer.

Is there also an (easy) way to train pretrained models which were originally only trained on MLM to also include an additional prediction task.

I think the procedure is the same but you use pre-trained weights in model creation by calling

model = BertForNextSentencePrediction.from_pretrained(
model_args.model_name_or_path, […])

You can use the weights of the same model trained only for MLM, you simply get a warning that some weights are initialized from scratch (the weights of the classification layer), and that’s what you want!

EDIT: I’m assuming you’re using BERT, I don’t know if there is the same model structure for other models. You should look in the models folder here. Probabily there is a class named [Name of the model]ForSequenceClassification or something similar!
It’s possible that you have to mix the MLM and sequence classification heads tho!

Ah okay thank is what I was looking for. I am using Roberta which seems to not offer this option, if i looked it up correctly