I have successfully built a multi label classifier (10 labels, somewhat balanced) on sentence level with my own subclass of transformers library BertForSequenceClassification.
The classification performance is okay-ish. When I was first testing BERT on a binary classification task for a single label in my dataset it was very benefitial towards performance to include adversarial sentences that did not hold the same label. Thus, I tried the same approach in this multi label setup by adding more 0 labeled sentences to the dataset. But this worsened the performance.
My question would be if it is possible to have different training/evaluation datasets for each label in a Multilabel classification setup?
Some more background on the dataset and labels:
The labels of a sentence are indicated by a vector, e.g. (0, 0, 0, 1). It’s also possible that no label is present, i.e. the corresponding vector is (0, 0, 0, 0).
In total there are 10 labels.