How to dealing with Data Imbalance

BinhMinhs10 · July 20, 2020, 9:14am

I want to fine tune pre-trained Roberta or Electra for multiclass classification (sentiment classify) imbalance data set . How handling problem ??

swayson · July 20, 2020, 2:55pm

For class imbalance, one aspect to consider is that each batch has enough signal to provide some coverage of all the classes, even the unbalanced ones. Otherwise, it may degenerate during training.

When evaluating test performance though, you will need to keep the real proportions as you would observe in the real world.

codeninja · July 28, 2020, 9:39pm

I use a quick snippet to get the class distribution and pass that into the class weights.

from sklearn.utils import class_weight
class_weights = dict(enumerate(class_weight.compute_class_weight('balanced',
                                                         classes=np.unique(outputs),
                                                         y=outputs)))


history = nlp_model.fit( x_train, y_train, 
                                     batch_size=self.batch_size, 
                                     epochs=epochs,
                                     class_weight=class_weights,
                                     callbacks=self.callbacks,
                                     shuffle=True,
                                     validation_data = (x_test, y_test))

Topic		Replies	Views
Handling Extreme Class Imbalance for Multi-Class Classification Intermediate	1	47	May 14, 2025
Sentiment Analysis with Transfer Learning (roBERTa) - Imbalanced dataset Beginners	1	610	January 20, 2024
Unable to train the model with weighted cross entropy Beginners	0	544	March 1, 2024
Cross Entropy Weighted Beginners	12	7732	June 30, 2023
How can I use class_weights when training? 🤗Transformers	19	30357	December 29, 2022

How to dealing with Data Imbalance

Related topics