DistilBert for Self-Supervision - switch heads for pre-training: MaskedLM and SequenceClassification


Say I want to train a model for sequence classification. And so I define my model to be:

model = DistilBertForSequenceClassification.from_pretrained("bert-base-uncased")

My question is - what would be the optimal way if I want to pre-train this model with masked language modeling task? After pre-training I would like to model to train on the down-stream task of sequence classification.

My understanding is that I can somehow switch the heads of my model and a DistilBertForMaskedLM for pre-training, and then switch it back to the original downstream task. Although I haven’t figured out if this is indeed optimal or how to write it.

Does hugging face offer any built in function that accepts the input ids, a percentage of tokens to masked (which aren’t pad tokens) and simply trains the model?

I’ve tried to implement this myself, and while it does seem to work it is extremely slow. I figured there could already be implemented solutions instead of trying to optimize my code.

Thank you in advance