[Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages

patrickvonplaten · March 17, 2021, 12:47pm

Speech-To-Text in 60 languages

Hi all,

We organize a community week (Mar 22th to Mar 29th) to fine-tune the cross-lingual speech recognition model XLSR-Wav2Vec2 on all languages of the crowd-sourced Common Voice dataset.

What it is about

The goal of the event is to provide state-of-the-art XLSR-Wav2Vec2 speech recognition models in as many languages as possible to the community. We hope that especially research in speech recognition for low-resource languages can profit from it.

How does it work

Participants have one week to fine-tune as many XLSR-Wav2Vec2 models as they want on as many of the ~60 Common Voice’s languages as they want. Each fine-tuned model should then be evaluated on Common Voice’s test data of the respective language. All data can be used as training data, except the official test data (which will be checked by the Hugging Face team). After the fine-tuning week, the best performing models of each language will receive SWAG.

What do I need to do to participate

All you need is a google colab account to be able to run the fine-tuning script provided by us. If you can train the model on a local GPU - even better, but a free google colab is enough. This fine-tuning week should especially be interesting to you if you are a native speaker of a low-resource language because your language skills can help you get a better data processing pipeline than the competition.

If you want to participate all you need to do is to sign-up to the hub here and post your name and your hub username in this thread. The hugging face team will then add you to an internal Slack channel where you receive more in-detail information!

What do I get

enjoy a bit of HuggingFace vibe by joining the fine-tuning week
a fine-tuned model under your name on the hub
hugging face SWAG if you manage to have the best performing model in a language

Open-sourcely yours,

Patrick & Suraj

laxya007 · March 17, 2021, 1:45pm

Name : Laxya Agarwal
Username :laxya007

Very excited to be part of this competition.

stefan-it · March 17, 2021, 1:48pm

Hey Patrick, feel free to add me!

gorodecki · March 17, 2021, 1:51pm

Hello! Good news, i’m on it!

Name: Anton Nekrasov
Nickname: gorodecki

khursani8 · March 17, 2021, 1:58pm

Name: Khursani
Username: khursani8

DewiBrynJones · March 17, 2021, 1:59pm

Name: Dewi Bryn Jones
Username: DewiBrynJones

ceyda · March 17, 2021, 2:01pm

Name: Ceyda
Username: ceyda

CupOfGeo · March 17, 2021, 2:03pm

Name: George Mazzeo
username:CupOfGeo

ganesh3 · March 17, 2021, 2:10pm

Name: Ganesh
username: ganesh3

ayameRushia · March 17, 2021, 2:21pm

Name: Muhammad Agung Hambali
Username ayameRushia

manandey · March 17, 2021, 2:25pm

Name: Manan Dey
Username: manandey

ozcangundes · March 17, 2021, 2:31pm

Name: Ozcan Gundes
Username: ozcangundes

kmfoda · March 17, 2021, 2:36pm

Name: Karim Foda
Username: kmfoda

Zaid · March 17, 2021, 2:45pm

I will work on Arabic.

Name: Zaid Alyafeai
Username: Zaid

HarshitRathore · March 17, 2021, 2:47pm

Name: Harshit Rathore
Username: HarshitRathore

vasudevgupta · March 17, 2021, 2:49pm

Name: Vasudev Gupta
Username: vasudevgupta

Galuh · March 17, 2021, 2:49pm

Name: Galuh Sahid
Username: Galuh

Will work on Indonesian

Topic		Replies	Views
[Open-to-the-community] Robust Speech Recognition Challenge Languages at Hugging Face	24	12487	January 29, 2022
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3006	January 4, 2022
Indian Languages - XLSR - Wav2Vec2 Fine- Tuning Languages at Hugging Face	0	929	March 29, 2021
Kannada ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	1	1048	March 22, 2021
Dutch ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	369	March 20, 2021