Fine-tuning: Merge or chain?

bozden · May 30, 2023, 12:27am

We have several data sources for whisper fine-tuning, so we have two options:

Merge: Convert/merge the datasets and fine-tune on them
Chain: Fine-tune on dataset DS1, then from the best checkpoint fine-tune on DS2 etc…

I’m thinking about stuff like “dataset cleanness”, audio duration differences / possible chunking, keeping the computer on for many days because the merged dataset is large etc.

What is the proper/suggested method for fine-tuning transformers models?

Thinkcru · October 7, 2023, 1:42pm

Hello, did you ever find a good solution to the options to outlined? Would love to learn more.

bozden · October 7, 2023, 2:14pm

Unfortunately, I did not have time to try them yet. My current work is about finetuning several languages with Common Voice, using different splitting algorithms. It is because CV is not used for training Whisper.

It is important to have distinct voices in train-dev-test splits
Therefore I decided NOT to merge them but use the other datasets as test datasets.

I’ll go further with nVidia models for comparison.

Unfortunately, the dataset cards of all those models are not detailed enough to show me what voices/sentences/utterances are used during training. It would be worse if I used them again in fine-tuning.

If you are sure this is not the case, my opinion is to merge them before fine-tuning, for better results. During chaining, you start from a more “fixated knowledge state” and the order will also become another parameter.

Topic		Replies	Views
How to finetune whisper model 🤗Transformers	0	577	May 7, 2023
Fine tuning whisper on custom dataset Beginners	3	949	January 11, 2024
Resume Training / Finetune a language model and further finetune a classifier Research	1	1278	October 19, 2020
Further train a fine tuned wav2vec model 🤗Transformers	2	536	September 25, 2022
Merging bert-base-uncased models after trainer but before predict 🤗Transformers	6	2409	September 10, 2020

Fine-tuning: Merge or chain?

Related topics