Dhivehi ASR: Fine-Tuning Wav2Vec2

Hello everyone.

I’m fine-tuning XLSR-Wav2Vec2 in Dhivehi (dv) using Common Voice. There’s 18 hours of validated data in Common voice.
I followed the notebook on Turkish and got WER 0.54 (Trained for 60 epochs). Changing learning rate to 5e04 did not improve WER but learning rate 1e04 improved slightly WER to 0.52. These are trained for 30 epochs.

Now I’m looking into how I can improve this.
Would love to collaborate, discuss or any help on improving.


I have tried with different learning rates and dropout but WER did not improve much.
So far I have found learning rate 1e-4 works best for me.
Though there isn’t a huge improvement in WER, WER and validation loss does goes down with steps.
Still the best WER I have got is 0.52.
Dhivehi is a very small dataset and unfortunately there isn’t any other dataset available for Dhivehi other than Common voice.