Transcription summaries and actions

What is currently the best small model for summarising transcripts and extracting actions?

I’m looking at the <5B parameter or maybe <10B parameter classes

Transcripts will be produced by whisper + pyanote/diarization.

Audio clips will be at least 1hours long possibly as long as 6 hours in rare cases. So we can expect large transcripts.

1 Like

For smaller models, I think the Llama 3.2 or Qwen 2.5 series are safe, but there may be specific benchmarks on the leaderboard. The URL below is for the long-context-support version of Qwen.

Thanks for this, I wasnt aware of Qwens long context model :slight_smile:

Any thoughts on wether it will be better to use long context and try to summarise in one go compared to chunking the input into intermediate summaries?

1 Like

It would probably be more accurate to have the model directly summarize long contexts, but it would probably require a huge amount of VRAM and latency to process long contexts at once,:sweat_smile: so it would probably be smarter to process them in chunks. I think it would be easier to summarize short texts in chunks even with a small model.