How I fine-tune BART for summarization using large texts?

Good night!

I’m using a pre-trained Bart for summarization and I have my own dataset for fine-tuning (which has a set with the big text and its respective summary). Despite this, my input texts are approximately 2500 characters long and the maximum Bart accepts is 1024. Is there any technique I can use to use all text? I thought of splitting each cell into smaller texts (max 1024) and assigning the same summary to each. Makes sense?

Example:

Before:
ABC: summary1
DEF: summary2

After:
A: summary1
B: summary1
C: summary1
D: summary2
E: summary2
F: summary2

Thanks in advance!

Hi, there’e already thread for this, you might find it helpful

Do you have any idea how I can do this extractive summarization before? I would have to cut my text in half to be the ideal size, but I don’t know how to get the most relevant sentences in this extractive step.

could you post this question in that thread, people there might have tried this, let’s keep the long summ discussion in one thread :slight_smile:

2 Likes