Further train a fine tuned wav2vec model

Hi everyone,

I wonder what is the best approach to fine tune wav2vec model to different domains, on low resource language. for example, if I have 100 hours of data, recorded in professional studio with high quality equipment, and 10 hours of low quality data. Now assume my use case is the low quality samples.
First I want to train wav2vec on my language with the high quality large dataset, and then fine tune on the smaller(which is my use case)? or should I train only on the smaller? use all the data and give more weight to the low quality data(e.g duplicate samples with minor random noises)?

Is this even possible to fine tune a pre-trained model twice? (i.e fine tuning fine-tuned model)
I saw this post:

but there is no answer…


since you linked my post - i couldnt fine tune the wav2vec model. i used a diffrent model instead
(silero speech2text)

but i couldnt improve the results, i was just able to match the results after further finetuning

so i had 0 success


I assume it’s the same as separating the language model from the acoustic model? i.e. if I want train one AM and use 2 different LM - I have to train wav2vec again? (or using LM as separated layer above the AM)