I’m trying to find a good architecture for a model which has to do text classification. The domain is a chat-bot doing a help-desk. Its goal is to book appointments for customers who need some machines to be repaired. The current model only has to classify single utterances in one of the 20 categories.
I have around 8K examples in my data set. I’m wondering if there is some recommended type of architecture/model based on tranformers for this type of model.
I tried a model with a frozen DistilBERT layer followed by a fully connected layer, before a classification layer.
So basically the same architecture than DistilBertForSequenceClassification presentend here:
but with the DistilBERT layer frozen.
But the results were so so. So I’m thinking about two things:
- Is there something more appropriate than DistilBERT for my set-up?
- Should I maybe keep only some layers of DistilBERT frozen and not all of them?
If anyone has a suggestion, I would be glad to hear about it.
Thank you in advance!