PreTrain Wav2Vec2 in Spanish

mariagrandury · June 30, 2021, 5:29pm

PreTrain Wav2Vec2 in Spanish

There is currently only a multilingually pretrained model for Spanish Wav2Vec2. Let’s make a Wav2Vec2 only pretrained on Spanish.

Model

A randomly initialized Wav2Vec2 model.

Datasets

We can use the Spanish portion of Common Voice. The dataset is available through the datasets library here: common_voice · Datasets at Hugging Face.

Available training scripts

FlaxWav2Vec2 will be merged soon: [Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub and a pretraining script should be relatively easy to be merged.

(Optional) Desired project outcome

The best Spanish ASR model.

(Optional) Challenges

It would be nice to use more data than just the Common Voice dataset.

mrm8488 · June 30, 2021, 5:31pm

I am in!
cc: @patrickvonplaten @valhalla

valhalla · June 30, 2021, 5:42pm

Awesome, added you both to the team

edugp · June 30, 2021, 7:12pm

I am interested on this one as well, if there’s still time to join!

pcuenq · July 1, 2021, 12:22pm

I took part in the Wav2Vec2 fine-tuning week for Spanish, this is my forum post. My model card describes some of the pre-processing and training steps I took.

It would be awesome to create a pretrained model using Jax/Flax. I’m not sure I’ll have the time to take part in this one (I already signed up for a different project, and I know nothing about Jax), but I’ll try to follow your discussions if I can. Good luck!

Topic		Replies	Views
PreTrain Wav2Vec2 in German Flax/JAX Projects	7	1366	July 7, 2021
PreTrain Wav2Vec2 in Persian Flax/JAX Projects	0	1176	July 8, 2021
PreTrain Wav2Vec2 in Swedish Flax/JAX Projects	3	963	June 29, 2021
PreTrain Wav2Vec2 in Turkish Flax/JAX Projects	1	415	July 2, 2021
Pretrain Wav2vec2 in Russian Flax/JAX Projects	2	1096	July 1, 2021

PreTrain Wav2Vec2 in Spanish

PreTrain Wav2Vec2 in Spanish

Model

Datasets

Available training scripts

(Optional) Desired project outcome

(Optional) Challenges

Related topics