DistilBert for Self-Supervision - switch heads for pre-training: MaskedLM and SequenceClassification

re90 · February 16, 2023, 3:12pm

Hi,

Say I want to train a model for sequence classification. And so I define my model to be:

model = DistilBertForSequenceClassification.from_pretrained("bert-base-uncased")

My question is - what would be the optimal way if I want to pre-train this model with masked language modeling task? After pre-training I would like to model to train on the down-stream task of sequence classification.

My understanding is that I can somehow switch the heads of my model and a DistilBertForMaskedLM for pre-training, and then switch it back to the original downstream task. Although I haven’t figured out if this is indeed optimal or how to write it.

Does hugging face offer any built in function that accepts the input ids, a percentage of tokens to masked (which aren’t pad tokens) and simply trains the model?

I’ve tried to implement this myself, and while it does seem to work it is extremely slow. I figured there could already be implemented solutions instead of trying to optimize my code.

Thank you in advance

Topic		Replies	Views
Further pre-train language model in transformers like BERT Models	3	1108	March 27, 2022
Sequence classification VS MaskedLM Beginners	1	738	October 8, 2020
DistilBERT and CLS token Beginners	2	2447	February 21, 2021
Chapter 7 questions Course	119	10298	July 10, 2025
SpanBERT, ELECTRA, MARGE from scratch? Beginners	5	1379	July 22, 2023

DistilBert for Self-Supervision - switch heads for pre-training: MaskedLM and SequenceClassification

Related topics