How to give equal importance of all labels while dealing with unbalanced samples

for ex: I have four classes
class 1: 200 samples
class 2: 100 samples
class 3: 20 samples
class 4: 10 samples
so during predictions, most of my samples are predicting class 1 or 2 because samples are high even I have unlabeled samples of class3 and class4.
How can I overcome this in HF?
is it possible to tell me the model that concentrates more on class3 and class4?
Help me very soon.
Thank you in advance.

Hi Para, I’m afraid there is no magic bullet in HF that can help solve this problem. It’s a common ML challenge and as such the standard approaches apply: (1) Under-sampling: Randomly delete records from the over-represented classes (2) Over-sampling: Duplicate records in the under-represented classes (3) Synthetic sampling/data augmentation: In NLP, there are actually quite some interesting techniques with which you can augment your data in the under-represented classes to increase record count. Check out this library: GitHub - makcedward/nlpaug: Data augmentation for NLP

Hope that helps, let me know if any questions.

Cheers
Heiko

1 Like

may i know what is discussed here?(How can I use class_weights when training?)
I am unable to completely understand what they discussed in the above discussion.

Hi Para, please avoid creating duplicate threads. It makes it harder for other users in the future to find the correct answer for a given problem. I will flag this thread as duplicate and reply to you in the other thread.

Cheers
Heiko

Duplicate of How can I use class_weights when training? - #9 by para