Multiclass vs Multilabel

Hello all,

I’m relatively new to transformer library and its models.

What I’m try-out is fine-tuning pretrained model for classification task.

I understood that BertForSequenceClassification is for classical multi-class classification.
However, I’m confused on how to achieve “multi-label” classification task.

Can anyone guide me to multilabel classification?


What I mean by Multiclass, it means labels are exclusive to each other

  • num_labels = 5
  • labels=[0, 1, 0, 0, 0]


  • num_labels = 5
  • labels = [1, 0, 0, 1, 0]

See how transformers.modeling_bert.BertForSequenceClassification is implemented - there is a CrossEntropyLoss built into this class internally. You can create custom class (PyTorch module) similar to BertForSequenceClassification and apply multi-label loss of your choice there. You can also look-up my blogpost where I show how to build totally custom model on top of the pre-trained transformers models.

1 Like