Turkish ASR: Fine-Tuning Wav2Vec2

patrickvonplaten · March 18, 2021, 7:40am

Hey, I fine-tuned a XLSR-Wav2Vec2 in Turkish.

My training code can be found here: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_🤗_Transformers.ipynb

Training details:

Data: Official Common Voice Train + Validation dataset amounting to 3478 data samples

Data preprocessing: I removed all special characters like .,!?

Hyperparameter tuning:

Learning rate: A LR of 1e-5 didn’t work well. 3e-4 worked well
Dropout: I disabled (setting to 0.0) / enabled (setting to 0.1) multiple combinations of attention_dropout, hidden_dropout, feat_proj_dropout
Layerdrop: A rate of 0.1 worked well for me
mask_time_prob: I tried 0.1 and 0.05 → 0.05 seemed to work better

Time:

I trained for 30 epochs and saw the WER nicely decreasing. I could have probably trained a bit longer to get better results.

Training Graph:
You can see my some training stats here: https://wandb.ai/patrickvonplaten/huggingface/reports/Project-Dashboard–Vmlldzo1[…]539b1pmkfbfrd96pxtvzuycfkdc13sp1cy3wl161g2s9whrcikebv20rte35o9le

Any ideas on how training could be improved?

ayameRushia · March 18, 2021, 9:12am

Thank you very much for sharing the notebook, gonna work for my fine tuning process using your notebook.

ozcangundes · March 18, 2021, 10:11am

Great explanations about the process, so thank you for this effort @patrickvonplaten I started the same process for trying different (also too basic) preprocessing steps. Then, I will try different hyperparams settings if Colab let me do it Even I have Colab Pro, it generally limits my GPU connection and prevents to access it. So, if you have any suggestion about tuning hyperparameters faster or easier, I will be glad to hear.

ceyda · March 18, 2021, 11:59am

FYI: In Turkish you can omit the ' character from your vocabulary.(in reference to:this)
It’s usually just used as possessive pronoun(?) similar to Ceyda’s == Ceyda’nın OR separating other type of postfixes from special named entities. Doesn’t effect pronunciation

I just finished reading the blogpost+code… The end results were not that I was expecting much with so little data. Now I’m even more motivated to improve it ~ I have plans
Edit: oops it looks like I clicked the wrong reply button @ozcangundes I know you know Turkish and didn’t need the extra explanation

patrickvonplaten · March 18, 2021, 12:29pm

We are currently trying to get GPU compute power for everybody - we’ll let you know soon

ceyda · March 18, 2021, 1:16pm

tiny tiny bug: a couple of places forgets to -1 from len(list) like in here
should be:

random.randint(0, len(common_voice_train)-1)

patrickvonplaten · March 18, 2021, 3:35pm

Great catch! Do you want to open a PR to fix it and tag me for review?

ceyda · March 18, 2021, 6:31pm

Done here
Because ipynb files are finicky about changes(diffs) I just modified the relevant code block

ozcangundes · March 20, 2021, 8:04am

Hi all! I have just completed my initial experiment with 40 epoch and 2e-3 learning rate (the other hparams are kept the same with @patrickvonplaten demo notebook). I run the code from Kaggle since Colab Pro restricted my GPU usage due to long usage time It tooks more than 9 hours and Kaggle also stopped working However, validation loss and WER results did not change dramatically even though training loss decreases sharply, as seen below.

Also, keep in mind that fine tuning CNN layers does not lead to better results as suggested by the authors and demo file. I gave it a shot but the score (50% WER in test set) was not as good as demo file, too.

Good luck with your experiments

pbo · May 31, 2021, 4:13pm

Hi ! Thanks for the great tutorial. I fined-tuned a pre-trained model (facebook/wav2vec2-large-xlsr-53-french) and got encouraging results. However there are 2 things I wonder (and I think could maybe help some other fellow readers)

In both your notebook @patrickvonplaten and @ozcangundes screenshots, and also from what I experienced, Validation Loss decreases during the first iterations but then starts going up while Wer keeps going down. Do you know why this is happening and can we be sure that WER is the best metric to track (for early stopping at least) especially as I use a custom LM for prediction once my model is done training?
Which are the main hyperparameters to change to try to improve performance (I have limit GPU time and any insights or educated guess on where to start could help ;)) ?

Thanks again!

Topic		Replies	Views
Portuguese ASR: Fine-Tuning XLSR-Wav2Vec2 Languages at Hugging Face	10	1560	April 16, 2021
Swedish ASR: Fine Tuning Wav2Vec2 Models	4	869	March 23, 2021
Thai ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	1029	March 18, 2021
Kyrgyz ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	4	1536	October 21, 2021
Training and evaluation loss goes down however, WER score stays the same 🤗Transformers	0	376	May 23, 2022

Turkish ASR: Fine-Tuning Wav2Vec2

Training details:

Related topics