Summarization on long documents

@ananddeshpande (and anyone else who still needs an answer to this question) - take a look at Unlimiformer: GitHub - abertsch72/unlimiformer: Public repo for the preprint "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

1 Like

i am using nltk to tokenize text and set threshold limit to 512 tokens each . i still get my input segment greater than 1024 so i use truncation = True . then i run the code now i am not getting any error like limit exceed . but my concern is there any data loss because of truncation = True another question is how can i make it more faster if i reduce the max_lenght parameter then my code will work faster if not then suggest something. i am working on pdf summariser project using facebook/bart-large-cnn

How can I use this model if I need to summarize a table data

Hello everyone
I wanted to retrain a text summarizing model on my dataset, I want to use BART but the problem I encountered is that bart cannot take more than 1024 tokens while the features in my dataset mostly have between 15000 and 2000 tokens. Does anyone have an idea on how to handle this problem? Any suggestion would be great for me