I wonder what is the best approach to fine tune wav2vec model to different domains, on low resource language. for example, if I have 100 hours of data, recorded in professional studio with high quality equipment, and 10 hours of low quality data. Now assume my use case is the low quality samples.
First I want to train wav2vec on my language with the high quality large dataset, and then fine tune on the smaller(which is my use case)? or should I train only on the smaller? use all the data and give more weight to the low quality data(e.g duplicate samples with minor random noises)?
Is this even possible to fine tune a pre-trained model twice? (i.e fine tuning fine-tuned model)
I saw this post:
I assume it’s the same as separating the language model from the acoustic model? i.e. if I want train one AM and use 2 different LM - I have to train wav2vec again? (or using LM as separated layer above the AM)