I am not a native english speaker so please don´t blame me for the question. I am currently trying to figure out how I can fine-tune distilBART on some Financial Data (like finBERT). In the examples/seq2seq README it states:
For the CNN/DailyMail dataset, (relatively longer, more extractive summaries), we found a simple technique that works: you just copy alternating layers from
bart-large-cnn and finetune more on the same data.
As far as I understand this sentence, I can Only finetune a distilBART student with the same data the teacher was trained with (CCN/DM) or can I use my own dataset that is completely different to the one that the BART Teacher was trained on?
Thanks in advance