Pretrain Wav2vec2 in Russian
There is currently only a multilingually pretrained model for Russian Wav2Vec2. Let’s make a Wav2Vec2 only pretrained on Russian.
2. Language
The model will be trained in Russian.
3. Model
A randomly initialized Wav2Vec2 model
4. Datasets
Part of CommonVoice - CommonVoice - 35h
SberGolos - Golos - 1095h
SovaDataset - SovaDataset - 400h
5. Training scripts
We can make use of run_wav2vec2_pretrain_flax to train the model.
6. (Optional) Challenges
Dataset have many hours, training will take a long time
7. (Optional) Desired project outcome
The best Russian ASR model.