Hey, I fine-tuned a XLSR-Wav2Vec2 in Turkish.
My training code can be found here: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_🤗_Transformers.ipynb
Data: Official Common Voice Train + Validation dataset amounting to 3478 data samples
Data preprocessing: I removed all special characters like .,!?
- Learning rate: A LR of 1e-5 didn’t work well. 3e-4 worked well
- Dropout: I disabled (setting to 0.0) / enabled (setting to 0.1) multiple combinations of
- Layerdrop: A rate of 0.1 worked well for me
- mask_time_prob: I tried 0.1 and 0.05 → 0.05 seemed to work better
- I trained for 30 epochs and saw the WER nicely decreasing. I could have probably trained a bit longer to get better results.
You can see my some training stats here: https://wandb.ai/patrickvonplaten/huggingface/reports/Project-Dashboard–Vmlldzo1[…]539b1pmkfbfrd96pxtvzuycfkdc13sp1cy3wl161g2s9whrcikebv20rte35o9le
Any ideas on how training could be improved?