Unsupervised guided text classification with NN

aermak · March 27, 2022, 6:47pm

Hi, everyone!

Could you tell me, please, are there any techniques to tell NN which classes to categorize texts into, when there is no labeled training set or it is very small (1-2 instance for each class)? For example, is it possible to give some key words for each class, so that NN clusterized texts accordingly? Otherwise, NN produces classes which are of no interest to me.

My task at hand, is to classify texts on 1500 predefined categories. I was able to do it with GLDA Guided Latent Dirichlet Allocation, but I believe that I can achieve better results using NN.

I will be more than happy if you share links to models/articles or your thoughts. Thanks in advance.

merve · March 29, 2022, 10:17am

I feel like you can use zero shot text classification models to label your data, I don’t know if 1500 categories is too much though. Another idea: I recently came across this blog post on using BERT for topic modelling (it’s like an extension of using embeddings for topic modelling). The author of the blog post is the owner of a package called BERTopic which is something you might use. It’s based on transformers.

aermak · April 1, 2022, 12:15pm

Merve, thanks for your suggestions! It took me some time to check if those models are applicable for my task. And I am certain now that I will give them a try.

In addition, I am considering to try other two methods:

Tune some sentiment-analysis BERT model, to make it predict classes instead. Though I am in doubt that it will give good results, since I have much much more classes than emotional states it operates with.
Try active learning, with some smart schemes to ease the process of assigning one of 1.5k labels to text for assessors. This option is costly.

MahdiA · April 2, 2022, 7:19pm

If your labeled data are not enough, first you can try data augmentation. You can also think about self-supervised learning. It may help you.

Topic		Replies	Views
Multi Label Zero Shot Classification with Graphs Beginners	1	714	August 8, 2023
Text Categorization Model in ONNX Format Beginners	0	118	April 18, 2024
Best approach for multi-label multi-class texts in 2022? Beginners	0	338	October 7, 2022
Classification tweets by theme: How do i start? Beginners	5	680	March 7, 2022
Hybrid approach for text categorization (Rule based + ML) Beginners	0	531	April 11, 2022

Unsupervised guided text classification with NN

Related topics