Zero shot classification for long form text

kbridbur · April 15, 2021, 8:36pm

I’m looking to do topic prediction/classification on long form text (podcasts/transcripts) and I’m curious if anyone knows of a model for this? I’ve looked through the existing zero shot classification models but they all appear to be optimized for short form text like questions.

If anyone knows of such a model I would appreciate it

lewtun · April 16, 2021, 7:55am

cc @joeddav who is the zero-shot expert here

joeddav · April 20, 2021, 3:07pm

Tbh your best approach is probably to just to do use one of the existing models and either (1) truncate the longer documents or (2) split them into smaller segments and ensemble the model’s predictions to get an overall label. There might be something more amenable to long sequences but I doubt it would do much better than that if there is.

davidefiocco · November 15, 2021, 10:42pm

Just to clarify, current default behavior of the library when running ZeroShotClassificationPipeline on very long text will be (1) , i.e. truncation.

(@joeddav correct me otherwise, that’s what I infer from transformers.pipelines.zero_shot_classification — transformers 4.13.0.dev0 documentation )

gautamgoel962 · July 15, 2024, 5:25am

Yes, that’s true. I tried with the same. Will hypothesis help for better results ?

Topic		Replies	Views
Topic classification: is zero-shot the way? Beginners	0	302	August 12, 2021
Seperating Paragraphs in Text File Based on Topics for Zero-Shot Classification Beginners	1	215	May 8, 2024
Zero-Shot Classification Pipeline - Truncating Beginners	4	1158	May 27, 2021
New pipeline for zero-shot text classification 🤗Transformers	107	71679	February 17, 2025
Improving zero-shot classification for roughly tokenized labels Models	0	765	December 30, 2021

Zero shot classification for long form text

Related topics