Hi I was wondering whether one can easily train both masking and sentence classification at the same time.
So the goal is to use a pre-trained model and keep masking input for my domain-specific dataset, but at the same time use the Start Token to make prediction. I would combine the two losses and perform backpropagation.
The original BERT architecture was trained both for Masked Language Modeling and Next Sentence Prediction tasks. The latter is basically sentence classification, so you can simply use the origina architecture. You can find it at modeling_bert.py, it’s called BertForNextSentencePrediction. You can change the classification task simply deciding to give the model different labels!
I think the procedure is the same but you use pre-trained weights in model creation by calling
model = BertForNextSentencePrediction.from_pretrained(
model_args.model_name_or_path, […])
You can use the weights of the same model trained only for MLM, you simply get a warning that some weights are initialized from scratch (the weights of the classification layer), and that’s what you want!
EDIT: I’m assuming you’re using BERT, I don’t know if there is the same model structure for other models. You should look in the models folder here. Probabily there is a class named [Name of the model]ForSequenceClassification or something similar!
It’s possible that you have to mix the MLM and sequence classification heads tho!