Fine-tuning Dataset Requirements

Buckeyes2019 · September 4, 2020, 2:56pm

I am looking to fine-tune a BART-large model for a summarization task and I am creating a dataset to tune on. How should I structure this dataset? Should it have a column of text blocks and another column with associated summaries? Or, will simply providing the raw text (the text blocks) without summaries suffice? Thanks!

valhalla · September 6, 2020, 7:59am

Hi @Buckeyes2019
You can fine the seq2seq fine-tuning scripts here. The readme explains how the data should formatted and saved.
https://github.com/huggingface/transformers/tree/master/examples/seq2seq

Topic		Replies	Views
Where to find documentation on dataset format for finetuning Beginners	0	280	October 7, 2023
[Beginner] fine-tune Bart with custom dataset in other language? Beginners	2	3234	January 22, 2021
Problem fine-tuning a model with Seq2Seq Trainer Beginners	1	994	June 25, 2023
Finetuning BART for Abstractive Text Summarisation Beginners	1	5255	September 9, 2024
BART Fine-Tuning Resources/Help Beginners	0	331	March 7, 2023

Fine-tuning Dataset Requirements

Related topics