For smaller models, I think the Llama 3.2 or Qwen 2.5 series are safe, but there may be specific benchmarks on the leaderboard. The URL below is for the long-context-support version of Qwen.
Thanks for this, I wasnt aware of Qwens long context model
Any thoughts on wether it will be better to use long context and try to summarise in one go compared to chunking the input into intermediate summaries?
It would probably be more accurate to have the model directly summarize long contexts, but it would probably require a huge amount of VRAM and latency to process long contexts at once, so it would probably be smarter to process them in chunks. I think it would be easier to summarize short texts in chunks even with a small model.