Where to find documentation on dataset format for finetuning

pvelosipednikov · October 7, 2023, 6:34pm

I want to finetune a Bart model with an AutoModelForSeq2SeqLM head.

Where do I find the format to preprocess the dataset into?
I think the answer may be the arguments in the forward method of the tokenizer?

So in my case, I would need to make sure that my dataset has the following columns:
input_ids, attention_mask, decoder_input_ids, decoder_attention_mask
?

Would this potentially change depending on what head I use for finetuning, and if so where would I turn to see those requirements?

Topic		Replies	Views
Fine-tuning Dataset Requirements Beginners	1	425	September 6, 2020
BART Fine-Tuning Resources/Help Beginners	0	331	March 7, 2023
Fine-Tune BART using "Fine-Tuning Custom Datasets" doc Beginners	6	9340	October 28, 2020
Finetune BERT for information extraction Beginners	0	1796	June 6, 2022
[Beginner] fine-tune Bart with custom dataset in other language? Beginners	2	3232	January 22, 2021

Where to find documentation on dataset format for finetuning

Related topics