There is currently only a multilingually pre-trained model for Indonesian Wav2Vec2. Therefore we would like to pre-train Wav2Vec2 with only Indonesian datasets.
A randomly initialized Wav2Vec2 model (if possible the large model)
In addition to the Indonesian Common Voice (18h), we have also collected the following Indonesian speech datasets:
- Wavenet Synthetic Voice (>400h)
- TIML-IDN (14.5h)
- Bible.is (40h)
- Podcast (>10kh)
FlaxWav2Vec2 will be merged soon: [Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub and a pretraining script should be relatively easy to be merged.
The best Indonesian ASR model
We have a team from the last wav2vec2 event: