Which summarization model of huggingface supports more than 1024 tokens? Which model is more suitable for programming related articles?

anon89001014 · October 27, 2022, 9:57pm

If this is not the best place to ask this question please lead me to the most accurate one.

I am planning to use huggingface summarization (Models - Hugging Face) to summarize my lecture videos transcriptions

So far I have tested facebook/bart-large-cnn and sshleifer/distilbart-cnn-12-6 but they support maximum 1024 tokens as inputs

So here my questions

1: Are there any summarization model that supports longer inputs like 10000 words article?

2: What are the optimal output lengths for given input lengths? Lets say for 1000 words input, how much should be the optimal minimum output length (the min length of the summarized text)

3: Which model would likely to work on programming related articles?

Please give me model name from this repository : Models - Hugging Face

yungsinatra0 · July 31, 2023, 9:07am

Hi, not sure if you still answer to your question, however here are some options you can try:

LED (16k token input length) - allenai/led-base-16384 · Hugging Face
PRIMERA (~4k token input length) - allenai/PRIMERA · Hugging Face
Unlimiformer (unlimited input length?) - abertsch/unlimiformer-bart-govreport-alternating · Hugging Face (read the description first!)

Topic		Replies	Views
How does summarization work with pretrained models? 🤗Transformers	0	589	November 14, 2023
Namrata Hinduja Geneva Beginner to HuggingFace Platform Beginners	1	28	May 28, 2025
Summarization Question Beginners	0	73	May 29, 2024
Dealing with Chunked Input Text and Summaries for Fine Tuning Summarization model Beginners	1	1140	March 13, 2024
Transcription summaries and actions Beginners	3	39	March 5, 2025

Which summarization model of huggingface supports more than 1024 tokens? Which model is more suitable for programming related articles?

Related topics