Effect of different sample rates while finetuning an XLSR ASR model

yilmazay · April 27, 2023, 8:58am

Hi Everyone,
Recently, I am trying to finetune “facebook/wav2vec2-xls-r-300m” model with some Turkish dataset. I already have an ASR model that we generated with Kaldi, which has a WER around 10%. I was expecting to reach a better WER after adding about 100 hours of data of our own. However, each time I add some additional data, I get worse WER rates. The first time I got a WER of 20%, second time I got 37 %, finally I started a train with an additional data of about 40 hours, and the train started with a WER of 68%, and it decreased to 57% after 3 epoch. Although, until epoch 30 I expect some more decrease, but from my previous train experiences, it only drops around 20 to 30 % from the initial WER.
When I look at others’ pretrained asr models on HF, they all have much more acceptable WER ratios below 20 %.
So, I started to think that I might be doing something wrong.
My guess is that, my dataset that I am using for finetuning is 8khz, whereas almost all pretrained xlsr asr models are 16khz.
During finetuning I am upsampling my data to 16khz.
What I want to know is:
1- Is it OK to do finetuning a 16khz base model, with an 8khz data by upsampling to 16khz?
2- Should I use original 16Khz data for finetuning a 16khz base model?
3- Is it possible to convert a 16 khz base model to 8Khz model without losing performance (namely with the same amount of WER)?
I appreciate any guidance on this issue.
Thanks in advance.

Topic		Replies	Views
How much fire power are we expected to have in order to fine tune the W2V2 XLSR model? 🤗Transformers	4	881	March 27, 2021
Turkish ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	9	3303	May 31, 2021
ASR inference time too long Beginners	1	310	February 25, 2021
Swiss-German ASR: Fine-Tuning Wav2Vec-XLSR Languages at Hugging Face	0	555	March 18, 2021
Portuguese ASR: Fine-Tuning XLSR-Wav2Vec2 Languages at Hugging Face	10	1550	April 16, 2021

Effect of different sample rates while finetuning an XLSR ASR model

Related topics