I’m embarking on a project that involves creating a text classification model using Hugging Face’s transformers. The goal is to categorize a diverse dataset into a set of broad, predefined themes. Additionally, the model should be capable of suggesting new themes for entries that don’t fit into the existing categories.
I am not sure if this would be a classification since here number of classes can be huge in hundreds. Also if I choose topic modelling it may give distnct themes for even similar text entries.
Please suggest how to approach this.