Mullti Label Text Classification

I have a dataset of roughly 44k data points and 1500 labels. I want to use “AutoModelForSequenceClassification” for classifying the labels. I am able to pass in “multi_label_classification” in the problem type. However, the F1 score and accuracy score is quite poor. I suspect it’s because the data is sparse and 0 labels are preferrred over 1 labels (because there are less number of 1s for all the categories).

Although this is a broad problem, In particular I was wondering how I can pass in the “pos_weight” parameter to “BCEWithLogitsLoss” function that the model uses during training through the TrainerAPI or Training Arguments?

I saw the source code and it looks like it doesn’t accept parameters.

Hello @ayush-adeptmind,
From my understanding of your question you want to introduce a weighted loss into your multilabel classification problem.
For such a problem, the best way to do I’m aware of while using transformers library is to override the Trainer class (create a new trainer class which inherits from the actual Trainer) and override (by defining it again) the “compute_loss” method.
When instanciating BCEWithLogitsLoss, you should be able to insert a weight vector.

See How can I use class_weights when training? for more precise information and code snippets.

Have a good day!

Adding exact code for my implementation based on the source provided. Thanks!

from typing import Optional
from torch import FloatTensor
from torch.nn import BCEWithLogitsLoss
import logging

class WeightedTrainer(Trainer):
    def __init__(self, *args, class_weights: Optional[FloatTensor] = None, **kwargs):
        super().__init__(*args, **kwargs)
        if class_weights is not None:
            class_weights = class_weights.to(self.args.device)
            logging.info(f"Using multi-label classification with class weights", class_weights)
        self.loss_fct = BCEWithLogitsLoss(pos_weight=class_weights)

    def compute_loss(self, model, inputs, return_outputs=False):
        """
        How the loss is computed by Trainer. By default, all models return the loss in the first element.
        Subclass and override for custom behavior.
        """
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        try:
            loss = self.loss_fct(outputs.logits.view(-1, model.num_labels), labels.view(-1, model.num_labels))
        except AttributeError:  # DataParallel
            loss = self.loss_fct(outputs.logits.view(-1, model.module.num_labels), labels.view(-1, model.num_labels))

        return (loss, outputs) if return_outputs else loss
trainer = WeightedTrainer(
    model,
    args,
    train_dataset=ds_enc["train"],
    eval_dataset=ds_enc["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    class_weights=weights
)