BERT Multilabel - Different Training Dataset For Each Label?

Hi everyone,

I have successfully built a multi label classifier (10 labels, somewhat balanced) on sentence level with my own subclass of transformers library BertForSequenceClassification.

The classification performance is okay-ish. When I was first testing BERT on a binary classification task for a single label in my dataset it was very benefitial towards performance to include adversarial sentences that did not hold the same label. Thus, I tried the same approach in this multi label setup by adding more 0 labeled sentences to the dataset. But this worsened the performance.

My question would be if it is possible to have different training/evaluation datasets for each label in a Multilabel classification setup?

Some more background on the dataset and labels:
The labels of a sentence are indicated by a vector, e.g. (0, 0, 0, 1). It’s also possible that no label is present, i.e. the corresponding vector is (0, 0, 0, 0).
In total there are 10 labels.

Thank you!

Just out of curiosity, what does it mean to have “different dataset(s) for each label”?

Right now my model uses a single dataset for training and validation.

This is problematic for the model to learn representations of each label because of the nature of the data for some reasons:

  • Multi labels are somewhat rare ~ 5 %
  • Sentences with different labels are mostly very different

I think the performance of the model would be a lot better when the model uses label specific training and validation data to include more difficult cases for each label.

Thus, I am asking whether this is possible with transformers library or it is not benefitial at all towards performance.

I’m also working on a multilabel classifier right now with BERT and have the same question. Have you found anything helpful since you posted this comment?