PreTrain Wav2Vec2 in Dhivehi

eyna · June 29, 2021, 2:24pm

PreTrain Wav2Vec2 in Dhivehi

There is currently only a multilingually pretrained model for Dhivehi Wav2Vec2. We would like to make a Wav2Vec2 only pre-trained on Dhivehi.

Model

A randomly initialized Wav2Vec2 model

Datasets

commonvoice has 18hrs in the last released dataset. [ 32hrs+ if mid 2021 dataset released in time]
podcast data [30hr]
others []

Available training scripts

FlaxWav2Vec2 will be merged soon: [Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub and a pretraining script should be relatively easy to be merged.

(Optional) Desired project outcome

The best Dhivehi ASR model

(Optional) Challenges

scraping publicly available Dhivehi audio from various sources

politecat314 · June 29, 2021, 2:36pm

Am interested and would like to join this project

patrickvonplaten · June 29, 2021, 2:37pm

Awesome! Let’s finalize it directly

Ankit-Kumar-Saini · July 1, 2021, 12:35pm

This is a very interesting project. I always wanted to work on speech recognition task. This is a great opportunity to learn and contribute. Looking forward to be a part of this project.

Topic		Replies	Views
PreTrain Wav2Vec2 in Indonesian Flax/JAX Projects	1	366	June 29, 2021
PreTrain Wav2Vec2 in Persian Flax/JAX Projects	0	1176	July 8, 2021
PreTrain Wav2Vec2 in German Flax/JAX Projects	7	1365	July 7, 2021
PreTrain Wav2Vec2 in Swedish Flax/JAX Projects	3	963	June 29, 2021
PreTrain Wav2Vec2 in Turkish Flax/JAX Projects	1	415	July 2, 2021

PreTrain Wav2Vec2 in Dhivehi

PreTrain Wav2Vec2 in Dhivehi

Model

Datasets

Available training scripts

(Optional) Desired project outcome

(Optional) Challenges

Related topics