I’m looking to do topic prediction/classification on long form text (podcasts/transcripts) and I’m curious if anyone knows of a model for this? I’ve looked through the existing zero shot classification models but they all appear to be optimized for short form text like questions.
If anyone knows of such a model I would appreciate it
Tbh your best approach is probably to just to do use one of the existing models and either (1) truncate the longer documents or (2) split them into smaller segments and ensemble the model’s predictions to get an overall label. There might be something more amenable to long sequences but I doubt it would do much better than that if there is.