Turkish ASR: Fine-Tuning Wav2Vec2

Hey, I fine-tuned a XLSR-Wav2Vec2 in Turkish.

My training code can be found here: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_🤗_Transformers.ipynb

Training details:

Data: Official Common Voice Train + Validation dataset amounting to 3478 data samples

Data preprocessing: I removed all special characters like .,!?

Hyperparameter tuning:

  • Learning rate: A LR of 1e-5 didn’t work well. 3e-4 worked well
  • Dropout: I disabled (setting to 0.0) / enabled (setting to 0.1) multiple combinations of attention_dropout, hidden_dropout, feat_proj_dropout
  • Layerdrop: A rate of 0.1 worked well for me
  • mask_time_prob: I tried 0.1 and 0.05 → 0.05 seemed to work better


  • I trained for 30 epochs and saw the WER nicely decreasing. I could have probably trained a bit longer to get better results.

Training Graph:
You can see my some training stats here: https://wandb.ai/patrickvonplaten/huggingface/reports/Project-Dashboard–Vmlldzo1[…]539b1pmkfbfrd96pxtvzuycfkdc13sp1cy3wl161g2s9whrcikebv20rte35o9le

Any ideas on how training could be improved?


Thank you very much for sharing the notebook, gonna work for my fine tuning process using your notebook. :+1:

1 Like

Great explanations about the process, so thank you for this effort @patrickvonplaten :slightly_smiling_face: I started the same process for trying different (also too basic) preprocessing steps. Then, I will try different hyperparams settings if Colab let me do it :pray: Even I have Colab Pro, it generally limits my GPU connection and prevents to access it. So, if you have any suggestion about tuning hyperparameters faster or easier, I will be glad to hear.

1 Like

FYI: In Turkish you can omit the ' character from your vocabulary.(in reference to:this)
It’s usually just used as possessive pronoun(?) similar to Ceyda’s == Ceyda’nın OR separating other type of postfixes from special named entities. Doesn’t effect pronunciation

I just finished reading the blogpost+code… The end results were :laughing: not that I was expecting much with so little :label: data. Now I’m even more motivated to improve it ~ I have plans :sunglasses:
Edit: oops it looks like I clicked the wrong reply button @ozcangundes I know you know Turkish and didn’t need the extra explanation :laughing:


We are currently trying to get GPU compute power for everybody - we’ll let you know soon :slight_smile:


tiny tiny bug: a couple of places forgets to -1 from len(list) like in here
should be:

random.randint(0, len(common_voice_train)-1)
1 Like

Great catch! Do you want to open a PR to fix it and tag me for review?

Done here :+1:
Because ipynb files are finicky about changes(diffs) I just modified the relevant code block

Hi all! I have just completed my initial experiment with 40 epoch and 2e-3 learning rate (the other hparams are kept the same with @patrickvonplaten demo notebook). I run the code from Kaggle since Colab Pro restricted my GPU usage due to long usage time :pensive: It tooks more than 9 hours and Kaggle also stopped working :sweat_smile: However, validation loss and WER results did not change dramatically even though training loss decreases sharply, as seen below.

Also, keep in mind that fine tuning CNN layers does not lead to better results as suggested by the authors and demo file. :sweat_smile: I gave it a shot but the score (50% WER in test set) was not as good as demo file, too.

Good luck with your experiments :pray:


Hi ! Thanks for the great tutorial. I fined-tuned a pre-trained model (facebook/wav2vec2-large-xlsr-53-french) and got encouraging results. However there are 2 things I wonder (and I think could maybe help some other fellow readers)

  1. In both your notebook @patrickvonplaten and @ozcangundes screenshots, and also from what I experienced, Validation Loss decreases during the first iterations but then starts going up while Wer keeps going down. Do you know why this is happening and can we be sure that WER is the best metric to track (for early stopping at least) especially as I use a custom LM for prediction once my model is done training?

  2. Which are the main hyperparameters to change to try to improve performance (I have limit GPU time and any insights or educated guess on where to start could help ;)) ?

Thanks again!