Summarization on long documents

yungsinatra0 · July 31, 2023, 9:02am

@ananddeshpande (and anyone else who still needs an answer to this question) - take a look at Unlimiformer: GitHub - abertsch72/unlimiformer: Public repo for the preprint "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

cythonboy · December 26, 2023, 9:39am

i am using nltk to tokenize text and set threshold limit to 512 tokens each . i still get my input segment greater than 1024 so i use truncation = True . then i run the code now i am not getting any error like limit exceed . but my concern is there any data loss because of truncation = True another question is how can i make it more faster if i reduce the max_lenght parameter then my code will work faster if not then suggest something. i am working on pdf summariser project using facebook/bart-large-cnn

faizalbs777 · January 22, 2024, 4:19pm

How can I use this model if I need to summarize a table data

kader99 · August 16, 2024, 10:11am

Hello everyone
I wanted to retrain a text summarizing model on my dataset, I want to use BART but the problem I encountered is that bart cannot take more than 1024 tokens while the features in my dataset mostly have between 15000 and 2000 tokens. Does anyone have an idea on how to handle this problem? Any suggestion would be great for me

Topic		Replies	Views
Summarization pipeline on long text Beginners	6	4501	December 14, 2022
Longformer for text summarization Beginners	10	5253	August 6, 2022
How I fine-tune BART for summarization using large texts? Research	3	3991	September 28, 2020
How Can I Accurately Summarize Long Japanese Texts? Beginners	1	25	April 28, 2025
Help Improving Abstractive Summarization 🤗Transformers	2	986	May 19, 2021

Summarization on long documents

Related topics