I have tried to use the model ‘typeform/distilbert-base-uncased-mnli’ in zero-shot classification (multi-class, not multi-label). However, I am getting very poor results especially when compared to using the model ‘facebook/bart-large-mnli’. I have used both the zero-shot classification pipeline and without it, and the results are still just as bad.
The test dataset I am trying to classify into categories has 46 entries and 13 categories, and the accuracy I am getting with the DistilBERT MNLI model is around 15%, whereas this goes up to 57% with the Bart large MNLI model.
Has anyone else also found such a massive difference in performance when using a distilled model for this task? I assumed it would be comparable since DistilBERT and BERT have comparable performances for many NLP tasks, but the results are too different.
Also, does anyone have any benchmark results for these two models, either in NLI tasks or zero-shot classification tasks?