Help with BERT Adapter + LoRA for Multi-Label Classification (301 classes)

izow · June 30, 2025, 1:33pm

My name is Robin, and I’m currently doing an internship. I’d like to ask you a question regarding fine-tuning a BERT model.

I’m working on a multi-label classification task with 301 labels. I’m using a BERT model with Adapters and LoRA. My dataset is relatively large (~1.5M samples), but I reduced it to around 1.1M to balance the classes — approximately 5000 occurrences per label.

However, during fine-tuning, I notice that the same few classes always dominate the predictions, despite the dataset being balanced.

Do you have any advice on what might be causing this, or what I could try to fix it?

Thank you in advance!
Robin

John6666 · June 30, 2025, 2:29pm

Since this fine-tuning is large-scale, I think it would be better to first prepare a small training loop for testing and try a few small trainings.

Test tuning without LoRA or with rank set to 64 to see if LoRA has a significant impact. Testing by borrowing hyperparameters from successful examples of similar models/tasks by other people… etc.

github.com/mujib2020/Fine-Tuning-BERT-for-text-classification-with-LoRA

textspamclassification_distil_bert.py

main

# -*- coding: utf-8 -*-
"""textSpamClassification_distil_BERT.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1KNK5wNfjfNApQG7j_QKfRilBvwFdgUEZ

#  distilBERT base model Fine-tuning usingLoRA

Classifying spam emails/texts using distilBERT.
"""

!pip install peft
!pip install evaluate
!pip install transformers[torch]
!pip install accelerate -U
!pip install transformers -U

import accelerate

This file has been truncated. show original

izow · June 30, 2025, 2:39pm

Hello,
Thanks for the advice.
I had already tested on another smaller dataset with only 6 labels for multi-label classification, and I had obtained more than decent results, still using LoRa, where I saved about 5 minutes on a 50-minute training time.

John6666 · June 30, 2025, 2:42pm

Hello. I see. So, it seems like the problem can be narrowed down to a few issues, such as problems that arise when there are many classes to teach, or problems such as the base model weights being stubborn…

izow · June 30, 2025, 2:52pm

Do you have any advice to help guide my research?
Would it make sense to try fine-tuning the model directly without using LoRa?

John6666 · June 30, 2025, 3:07pm

Do you have any advice to help guide my research?

I’m not very familiar with NLP itself, so I think I can only help with troubleshooting…

Would it make sense to try fine-tuning the model directly without using LoRa?

Yeah. I think so. Bugs aside, using LoRA (PEFT) can change the content and quality of learning for better or worse. Especially when pre-training a model from scratch, it is usually safer to do so without LoRA first.

By the way, I thought that bias due to class overlapping might occur. This is unlikely to be a problem when there are few classes, so it may be one of the causes in this case.

izow · June 30, 2025, 3:19pm

Alright, I’ll launch a new training run using only BERT to see how it goes. Thanks for the advice!

Topic		Replies	Views
Multi-class Classification Basics Beginners	4	4771	August 24, 2021
Finetuning from multiclass to mutlilabel Intermediate	4	794	September 1, 2021
Multiclass vs Multilabel Beginners	1	2647	August 11, 2020
BERT Multilabel - Different Training Dataset For Each Label? Intermediate	3	1314	December 27, 2021
TFBertForSeqClassification for multilabel classification 🤗Transformers	0	892	July 18, 2022

Help with BERT Adapter + LoRA for Multi-Label Classification (301 classes)

Related topics