Separate LM fine tuning and classification head training

I have a large text corpus, and a small subset of it that is labelled for a mutli-label text classification task. I’ve seen many (excellent!) examples of fine-tuning different models for sequence classification, but I couldn’t find one in which the training is separated to two distinct stages:

  1. Fine-tune a specific language model on a specific corpus (unsupervised).
  2. Train a sequence classification model on top of it on a labelled subset of the original dataset (supervised).

I assume this is a pretty common (and simple) scenario, but I couldn’t find any relevant docs. If anyone can provide any pointers, it would be much appreciated!

Hi @adamh

You can use the run_language_modeling script here to finetune the pre-trained model for ex BertForMaksedLM.

Then you should be able to load the model using BertForSequenceClassification model which will take the base model and add a classification head on top, which you can then fine-tune for classification.

Would love to see a combined script that shows this working.

Pre-training usually takes longer time and there are multiple downstream task you can use the pre-trained model for so IMO combined script won’t be useful. But feel free to create one if you feel like it :wink:

if I could get it to work, I would. But I can’t, which is why I’d like to see a working example.

Hi @adamh, did you find out how to do this? I also want to leverage a fine-tuned language model for classification. As far as I can tell when we choose a classification head then we must discard the previous head - i.e. in this case our beautifully fine-tuned LM head! So, can we put a classification head on top of this rather than discard it, or - do we need to unlock the next layers of our LM and propagate the weights down into the model so that when discarding our LM head we still have an underlying model that has inherited our fine-tuning?