Text classification on small dataset (8K)

abercher · July 23, 2021, 9:00am

Hello,

I’m trying to find a good architecture for a model which has to do text classification. The domain is a chat-bot doing a help-desk. Its goal is to book appointments for customers who need some machines to be repaired. The current model only has to classify single utterances in one of the 20 categories.

I have around 8K examples in my data set. I’m wondering if there is some recommended type of architecture/model based on tranformers for this type of model.

I tried a model with a frozen DistilBERT layer followed by a fully connected layer, before a classification layer.

So basically the same architecture than DistilBertForSequenceClassification presentend here:
https://huggingface.co/transformers/_modules/transformers/models/distilbert/modeling_distilbert.html#DistilBertForSequenceClassification
but with the DistilBERT layer frozen.

But the results were so so. So I’m thinking about two things:

Is there something more appropriate than DistilBERT for my set-up?
Should I maybe keep only some layers of DistilBERT frozen and not all of them?

If anyone has a suggestion, I would be glad to hear about it.

Thank you in advance!

ehalit · July 27, 2021, 4:58am

In my experience (I worked with BERT and RoBERTa), not updating the transformer model parameters during fine-tuning resulted in lower accuracy and slower decrease in the loss value. This might mean that the fully connected layer alone is not enough to model the task at hand. I suggest updating the parameters of the DistilBert model as well, which is what fine-tuning is for.

I should also note that freezing the first 6 layers of BERT-base did not decrease the accuracy of the model significantly, in my case.

Topic		Replies	Views
Using EXTREMELY small dataset to finetune BERT 🤗Transformers	6	13192	February 1, 2023
Training classifier with frozen DistilBERT embeddings Beginners	5	3458	January 24, 2025
Auto Vs DistilBert for Classification : Accuracy/F1 varies a lot Beginners	0	304	March 31, 2022
Fine Tune BERT Models Beginners	5	16621	June 25, 2021
Fine-tuning BERT Model on domain specific language and for classification 🤗Transformers	7	8441	November 14, 2024

Text classification on small dataset (8K)

Related topics