Multi-class Classification Basics

Hello,

I got a really basic question on the whole BERT/finetune BERT for classification topic:

I got a dataset with customer reviews which consists of 7 different labels such as “Customer Service”, “Tariff”, “Provider related” etc. .

My dataset contains 12700 not labelled customer reviews and I labelled 1100 reviews for my classification task. Now to my questions:

  • Could it be enough to take an existing BERT model and fine-tune it with AutoModelForSequenceClassification on my specific task?

  • Are the 1100 labelled reviews (around 150 per each class) enough to train that?

  • What are other approaches?

I am completely new to the whole NLP task and working on it for 3 weeks. I just understood that more traditional approaches are often outperformed by transformers…

Thanks in advance!

2 Likes

Yes. You can use “bert-base-uncased” for example.

Yes, that’s the power of transfer learning: we only need a few (like a hundred) examples per class.

Actually, that’s the most typical approach: fine-tune an xxxForSequenceClassification model using a pre-trained base. You can check out the official tutorial here: https://github.com/huggingface/notebooks/blob/master/examples/text_classification.ipynb

1 Like

Thank you so much for your reply and helping me to go on. For now it looks like I have to use MultiLabel Classification instead of MultiClass…does it make any difference in the way I set up the transformer?

If a given review can have more than 1 label, then it’s a multi-label text classification problem indeed.

The only thing you’ll need to change is setting the problem_type to multi_label_classification when instantiating an xxxForSequenceClassification model. Suppose that we have 7 different labels and we want to do multi-label classification, then you can for example instantiate a BERT model as follows:

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", problem_type="multi_label_classification", num_labels=7)
2 Likes

Thank you so much for getting back that fast!