How I fine-tune BART for summarization using large texts?

thiagocmoreira · September 27, 2020, 4:02am

Good night!

I’m using a pre-trained Bart for summarization and I have my own dataset for fine-tuning (which has a set with the big text and its respective summary). Despite this, my input texts are approximately 2500 characters long and the maximum Bart accepts is 1024. Is there any technique I can use to use all text? I thought of splitting each cell into smaller texts (max 1024) and assigning the same summary to each. Makes sense?

Example:

Before:
ABC: summary1
DEF: summary2

After:
A: summary1
B: summary1
C: summary1
D: summary2
E: summary2
F: summary2

Thanks in advance!

valhalla · September 27, 2020, 7:00am

Hi, there’e already thread for this, you might find it helpful

thiagocmoreira · September 27, 2020, 3:17pm

Do you have any idea how I can do this extractive summarization before? I would have to cut my text in half to be the ideal size, but I don’t know how to get the most relevant sentences in this extractive step.

valhalla · September 28, 2020, 6:15am

could you post this question in that thread, people there might have tried this, let’s keep the long summ discussion in one thread

Topic		Replies	Views
Finetuning BART for Abstractive Text Summarisation Beginners	1	5239	September 9, 2024
How to increase the length of the summary in Bart_large_cnn model used via transformers.Auto_Model_frompretrained? Beginners	1	999	November 15, 2021
Facebook/bart-large-cnn resulting in weird output Beginners	0	439	March 20, 2023
Summarization on long documents 🤗Transformers	63	58955	August 16, 2024
Bart input confusion Beginners	2	3895	September 14, 2020

How I fine-tune BART for summarization using large texts?

Related topics