Fine-tuning distiBART

Hi there,

I am not a native english speaker so please don´t blame me for the question. I am currently trying to figure out how I can fine-tune distilBART on some Financial Data (like finBERT). In the examples/seq2seq README it states:

For the CNN/DailyMail dataset, (relatively longer, more extractive summaries), we found a simple technique that works: you just copy alternating layers from bart-large-cnn and finetune more on the same data.

As far as I understand this sentence, I can Only finetune a distilBART student with the same data the teacher was trained with (CCN/DM) or can I use my own dataset that is completely different to the one that the BART Teacher was trained on?

Thanks in advance

You can finetune distilbart on any data you want, the question is how well different approaches will perform.

Without knowing much more about the data and assuming you want to be able to train in <24h, I would probably start from sshleifer/distilbart-cnn-12-3.

1 Like

Thanks for the reply.
I hope I will get some decent result if I managed to understand the whole fine-tuning process.