Pretrain Wav2vec2 in Russian

Pretrain Wav2vec2 in Russian

There is currently only a multilingually pretrained model for Russian Wav2Vec2. Let’s make a Wav2Vec2 only pretrained on Russian.

2. Language

The model will be trained in Russian.

3. Model

A randomly initialized Wav2Vec2 model

4. Datasets

Part of CommonVoice - CommonVoice - 35h
SberGolos - Golos - 1095h
SovaDataset - SovaDataset - 400h

5. Training scripts

We can make use of run_wav2vec2_pretrain_flax to train the model.

6. (Optional) Challenges

Dataset have many hours, training will take a long time

7. (Optional) Desired project outcome

The best Russian ASR model.

@anton-l :sunglasses: :innocent: :jax:

1 Like

Would be awesome to see this happen!

1 Like