How can I use class_weights when training?

I have an unbalanced dataset. When training I want to pass class_weights so the update for rare classes is highen than for large classes. How is this possible in HF with PyTorch?

Thanks
Philip

3 Likes

Answering my own question:
Subclass Trainer and override the compute_loss method (see example here).

1 Like

Sorry I missed your question and didn’t point you to this directly.

1 Like

No worries. You did on GitHub… :slight_smile: :+1:

Hi @PhilipMay !Do you mind pasting an example please, as I don’t really understand the documentation

2 Likes

@PhilipMay an example would be highly appreciated.

Overwriting the Trainer can be done as follows (this is also explained in the docs):

from torch import nn
from transformers import Trainer

class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get('logits')
        # compute custom loss
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([0.2, 0.3]))
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss
5 Likes

Thanks a million :slight_smile:

can any help me with how can apply it to text classification?
for ex: I have four classes
class 1: 200 samples
class 2: 100 samples
class 3: 20 samples
class 4: 10 samples
so during predictions, most of my samples are predicting class 1 or 2 because samples are high even I have unlabeled samples of class3 and class4.
How can I overcome this in HF?
is it possible to tell me the model that concentrates more on class3 and class4?
Help me very soon.
Thank you in advance.

Hi @para, the thread above explains exactly what you need to do if you want to use class weights to handle an imbalanced dataset.

But if you are not sure what class weights are, what they do or how to overwrite the Trainer yourself I’d suggest that you take more time first to really understand the concepts and how it works.

You can also have a look at Simpletransformer.ai - it already implements class weight handling. But be aware that blindly trying out something will probably not give you the best results.

Additional note: it seems to be a common rule of thumb that the largest class should be max 10 times bigger than the smallest class. If this is not the case, you can try sampling techniques on your data as suggested here.

Just to elaborate on the sampling techniques @goerlitz is mentioning:

It’s a common ML challenge and the standard approaches for sampling are:

  1. Under-sampling: Randomly delete records from the over-represented classes
  2. Over-sampling: Duplicate records in the under-represented classes
  3. Synthetic sampling/data augmentation: In NLP, there are actually quite some interesting techniques with which you can augment your data in the under-represented classes to increase record count. Check out this library: GitHub - makcedward/nlpaug: Data augmentation for NLP

With regards to using class weights, this tutorial might be helpful to understand the approach better: How to use class weight in CrossEntropyLoss for an imbalanced dataset? - knowledge Transfer

hi, @goerlitz Thank you for the reply.
this simpletransformer.ai blog helps me to understand what are class weights and how they will impact the calculation of loss function while having unbalanced samples.
all now I need is programmatic implementation.
I am using (Hugging Face DistilBert & Tensorflow for Custom Text Classification. | by Galina Blokh | Geek Culture | Medium) this blog to create a text classification model.
as well I am using Trainer API to train the model.
sample code:
class Dataset(torch.utils.data.Dataset):
def init(self, encodings, labels=None):
self.encodings = encodings
self.labels = labels
def getitem(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
if self.labels:
item[“labels”] = torch.tensor(self.labels[idx])
return item
def len(self):
return len(self.encodings[“input_ids”])
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)
args = TrainingArguments(
evaluation_strategy=“steps”,
eval_steps=500,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=1,
seed=0,
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics,)
Can you please help me where to use custom trainer function in my code

Hi @para, it is good practice format pasted code with the preformatted text option and make sure that the indentation is correct. Otherwise, your code is hart to read and other people are probably less likely to try to understand it. That means: make it as easy as possible for others to help you by giving enough context for the problem and only include relevant and readable code.
Tip: you can still edit your previous post and format the code example in the right way. (see image)

Now looking at your code…

class Dataset(torch.utils.data.Dataset):
  def __init__ (self, encodings, labels=None):
    self.encodings = encodings
    self.labels = labels
  def __getitem__ (self, idx):
    item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
    if self.labels:
      item[“labels”] = torch.tensor(self.labels[idx])
      return item
  def __len__ (self):
    return len(self.encodings[“input_ids”])

train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)
args = TrainingArguments(
  evaluation_strategy=“steps”,
  eval_steps=500,
  per_device_train_batch_size=8,
  per_device_eval_batch_size=8,
  num_train_epochs=1,
  seed=0,
  load_best_model_at_end=True,
)
trainer = Trainer(
  model=model,
  args=args,
  train_dataset=train_dataset,
  eval_dataset=val_dataset,
  compute_metrics=compute_metrics,)

I actually have several questions:

  • What model are you using?
  • What dataset are you using?
    • How many different categories does it have?
    • What is the highest and lowest frequency of categories?
    • How many examples do you have in training and validation set?
  • what evaluation metrics are you using?
  • what results do you get for your metrics at the end of training?
  • did you look at training and validation loss curves to judge if your model is overfitting?
  • did you try different values for num_train_epochs and learning_rate?

Actually, training for 1 epoch is only good for getting some preliminary results. You should try training with 2-4 epochs and different learning rates to get better results before trying out class weights.

But to answer your initial question:
Your can copy the code posted above and then replace the line

trainer = Trainer(

with

trainer = CustomTrainer(

That should work.

Thank you. It really helps me.
I am using my customer data and it has a total of 9 labels the distributions are,
class-1: 223
class-2: 119
class-3: 42
class-4: 37
class-5: 24
class-6: 23
class-7: 16
class-8: 12
class-9: 4
if I give all 9 classes to the model it treats the last 2-To 3 classes as outliers because it has very few samples.
so now my task is to take only the last four classes and try a few-shot classification on those samples.
and my assumption to perform few-shot classification is to use any pre-trained model so that it already has some knowledge(correct me if I am wrong and give any suggestions on how to do few-shot text classification)
so for that, i am using (model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=4))
and here I am not doing any test and train split because:

  1. my samples are very less
  2. even I do test and train split and see the scores on test data we can’t conclude the model is performing well even it is giving a 90 f1 score.
  3. so I am doing manual eyeball evaluation by giving unlabelled samples to the model.
    please suggest any other approaches to tackle this other than sampling.
    I have another approach in my mind:
    form classes, 1&2 take 5 samples calculate embeddings for 5 samples, and average it so that we get one embedding and map that embedding with class 1
    if we do the above for all samples then we get ~40 embeddings for class1 220%5=~40
    then we get balanced samples.
    can you suggest how to do this experiment with HF and if you feel this really works?

Thank you for the answer

In my dataset, there are a total of 9 classes and their distribution is,
223,119,42,37,24,23,16,12,4.
Instead of giving all classes to my model, I am giving the last four classes and trying a few-shot classification(my assumption is to try few-shot we have to pretrained model because the model already had some knowledge so that it can able to learn from a few samples). that is the reason I am an HF model
Correct me if my assumption is wrong and give any suggestion to try a few-shot with HF, That will be greatly helpful.
Here I am not splitting the data into train and validation because my samples are fewer. I am evaluating my model by manually giving unlabelled samples to the model.
I am using TFdistilbert for the sequence classification model.

Alright, you have a very imbalanced dataset with 500 examples and 9 classes.

I understand that you want to use it for training a classifier. But I’m not sure what you want to with it afterwards? Do you have a real world use case with another set of unlabeled documents that you want to classify?

The point is, if you have less than approx. 1000 unlabeled documents to be classified, you should rather label them manually than trying to train a classifier. That will be faster as it should only take a few days to complete.

If you have more than 1000 unlabeled documents I’d say that you should actually take the time to manually label more of them and thus increase the number of examples in your training set.

Why? For training a common classification model you should have at least 100 examples per class (more is better) and the most frequent class should not be 10x the least frequent class.
Another option is to aggregate the classes with few examples into a single class. This will increase accuracy etc. but of course you can not further differentiate.

Now coming to few shot learning. Actually, I have no practical experience with few show learning so I can not point to code or tutorial. But, few shot learning is a different approach with a different training goal. You can find many good articles that explain how it works and also use the search function of this forum to find discussions like this one.

Anyway, the bottom line is that for few shot learning you should use a different model (NLI).

However, as I said before, you actual use case is not really clear. If this is for educational purposes, I’d recommend to try out different datasets, one that is suitable for training a regular classifier and a different one that is suitable for few show learning. If this is a real world use case, you can try few shot learning but otherwise you should acquire more data.

Could anyone point me in the direction of how I’d go about addressing this issue (that is, using class weights) when working with Tensorflow?