Hi there! I just have a couple of hours to take the quick end-to-end test.
Turned out I just modified from great work Turkish ASR tutorial by @patrickvonplaten
You can check my training and interference code here:
Also, model here:
For training detail:
- Use proper Thai word tokenizer for pre-processing step
- Most of the parameters is the same as the original script but I change the number of epoch to 6
- With 6 epochs it took around 50 minutes with WER ~ 0.68 at the end, which is look so good to me!
Next step:
- Tweak more with hyperparameter tuning
- Checking more about the decoding pipeline. It may helps
- Finish reading all wav2vec2 articles by @patrickvonplaten and improve this work
Feel free to let me know if I could improve in some aspects. Thanks!