I am looking to fine-tune a BART-large model for a summarization task and I am creating a dataset to tune on. How should I structure this dataset? Should it have a column of text blocks and another column with associated summaries? Or, will simply providing the raw text (the text blocks) without summaries suffice? Thanks!
You can fine the seq2seq fine-tuning scripts here. The readme explains how the data should formatted and saved.