How to give equal importance of all labels while dealing with unbalanced samples

para · January 27, 2022, 5:34pm

for ex: I have four classes
class 1: 200 samples
class 2: 100 samples
class 3: 20 samples
class 4: 10 samples
so during predictions, most of my samples are predicting class 1 or 2 because samples are high even I have unlabeled samples of class3 and class4.
How can I overcome this in HF?
is it possible to tell me the model that concentrates more on class3 and class4?
Help me very soon.
Thank you in advance.

marshmellow77 · January 27, 2022, 6:45pm

Hi Para, I’m afraid there is no magic bullet in HF that can help solve this problem. It’s a common ML challenge and as such the standard approaches apply: (1) Under-sampling: Randomly delete records from the over-represented classes (2) Over-sampling: Duplicate records in the under-represented classes (3) Synthetic sampling/data augmentation: In NLP, there are actually quite some interesting techniques with which you can augment your data in the under-represented classes to increase record count. Check out this library: GitHub - makcedward/nlpaug: Data augmentation for NLP

Hope that helps, let me know if any questions.

Cheers
Heiko

para · January 28, 2022, 5:20am

may i know what is discussed here?(How can I use class_weights when training?)
I am unable to completely understand what they discussed in the above discussion.

marshmellow77 · January 28, 2022, 5:37am

Hi Para, please avoid creating duplicate threads. It makes it harder for other users in the future to find the correct answer for a given problem. I will flag this thread as duplicate and reply to you in the other thread.

Cheers
Heiko

marshmellow77 · January 28, 2022, 6:37am

Duplicate of How can I use class_weights when training? - #9 by para

Topic		Replies	Views
How can I use class_weights when training? 🤗Transformers	19	30399	December 29, 2022
Handling Extreme Class Imbalance for Multi-Class Classification Intermediate	1	57	May 14, 2025
The Best Approach for Weighted Multilabel Classification 🤗Transformers	1	70	January 24, 2025
How to dealing with Data Imbalance 🤗Datasets	2	6334	July 28, 2020
Multi label classification with large number of labels and sparse data 🤗Transformers	1	1526	July 15, 2023

How to give equal importance of all labels while dealing with unbalanced samples

Related topics