New pipeline for zero-shot text classification

joeddav · March 3, 2021, 9:36pm

For long documents, I don’t think there’s an ideal solution right now. If truncation isn’t satisfactory, then the best thing you can do is probably split the document into smaller segments and ensemble the scores somehow.

I do see lot of high scores (> 0.9) when multi_class = True for list of custom tags …

Yeah unfortunately this will just happen sometimes It’s the reason why multi_class=False is recommended when possible. It’s a lot easier to tell which one of K labels is the correct label rather than independently predicting each label based on the class name alone, as you do when multi_class=True. You might have to just try out a bunch of examples and see what threshold works best. It’s just a really hard problem to tell whether the class name y applies to the sentence x without any training data or additional context. So far this method is the best I’ve encountered, but hopefully we can improve with time.

Topic		Replies	Views
Zero shot classification with manual pytorch Beginners	0	719	August 27, 2021
Project: Create a new zero-shot model with NLI data 🤗 Course Projects	9	3649	April 11, 2023
Zero shot classification pipeline customization Intermediate	2	1748	April 27, 2022
Fine tune Zero-shot classification on multi-label dataset Models	4	3535	November 30, 2023
Model for Text Classification similar to bart-large-mnli, for TensorFlow Beginners	0	494	May 6, 2022

New pipeline for zero-shot text classification

Related topics