Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification?

swagat1509 · March 15, 2023, 11:35am

I am using BERT to do multiclass text classification. The number of output classes I have to predict from is: 116 and there is high degree of class imbalance that I see.
We have the following kind of records available for each of the classes:
{‘Class A’: 975 number of records,
‘Class B’: 776 number of records,
‘Class C’: 533 number of records,
‘Class D’: 412 number of records,
‘Class E’: 302 number of records,
‘Class F’: 250 number of records,
‘Class G’: 207 number of records,
‘Class H’: 137 number of records,
‘Class I’: 96 number of records,
‘Class J’: 51 number of records,
‘Class K’: 28 number of records,
‘Class L’: 17 number of records,
‘Class M’: 7 number of records,
‘Class N’: 2 number of records}

So I have two questions here:
Question1: As we have around 116 output classes to predict from, does that affect the performance of BERT due to the high number of output classes?

Question2: My original data has the similar type of class distribution that I have illustrated above. So how does this affect the performance of BERT and if it affects how do we handle this to get proper output?

Looking forward to get answer from the talented community we have here.

Much thanks in advance.

nikhilhuggingface96 · May 14, 2025, 7:16am

@swagat1509 Were you able to solve this ? I have the same scenario with around 106 classes, and highly imbalanced dataset, like 23k records for some class, and 2 records for some other class. I tried different models like distilbert-base-uncased, bert-base, deberta, roberta, bigbird, with different hyperparameter combinations, and different loss functions like focal loss, weighted loss etc., but I am not able to break the accuracy mark of 84 %. Please reply, if possible. Also, if someone else can help me in this scenario, your help would be greatly appreciated

John6666 · May 14, 2025, 11:32am

He seems to have gotten the answer itself. It doesn’t seem easy to improve performance…
https://datascience.stackexchange.com/questions/120215/does-high-number-of-output-labels-affect-the-performance-of-bert-and-how-to-hand

Topic		Replies	Views
Multi-class Classification Basics Beginners	4	4829	August 24, 2021
Multiclass vs Multilabel Beginners	1	2673	August 11, 2020
Multilabel classification for text Beginners	1	497	January 15, 2021
Improving performance results for BERT 🤗Transformers	2	976	November 25, 2020
Multi label classification with large number of labels and sparse data 🤗Transformers	1	1578	July 15, 2023

Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification?

Related topics