Fine-tuning BERT with multiple classification heads

I need to train a model that has the same backbone such as BERT as a feature extractor and use multiple classification heads. This scenario is similar to multi-task learning but all the tasks are classification tasks so I will need multiple classification heads. does anyone have any similar notebook code that I can start with?

HI SaraAmd, I’m looking at doing something very similar. I currently have ~15 classification models that all use the same language model. I have a feeling that I will have to write my own forward() function in the end. I’m curious if you made any progress, and if so, I was wondering if you could provide me some wisdom on the subject. I’m wondering if I need to re-train all of these models at the same time, or if I can extract the classification heads from my existing models and pack them together into a new model with a custom forward() function. If you found any existing literature that could put me on the right path, that’d be awesome too :slight_smile:

Hi, Unfortunately, I haven’t made any progress and I haven’t found any literature. But I also assume that the forward function needs to be implemented as well as the loss function for each classification head (all will have cross entropy but how I am not sure). We can brainstorm together to move forward. My goal is to have one BERT model as the feature extractor and then add n-number of classification heads on top of it to train the model is this your goal as well? I feel like you want to train different models separately.

Yes that is exactly my goal. Currently I am just barely able to fit most of my models onto my GPU at inference time, but the goal will be to have a smaller footprint on my GPU so that I can process data in batches instead. I don’t mind writing the forward function, I will likely begin tinkering with it next week. I just need to figure out how to extract the classification heads and pickle them so that I can import them into one large model.

As I mentioned, I have all my models trained separately already :slight_smile: But if someone has pre-written code to do multi-head training all at once, I don’t mind re-training if it means re-using code. I’m happy either way. I’ll share code with you as soon as I start putting pen to paper.