We are scaling multi-lingual speech recognition systems - come join us for the robust speech community event from Jan 24th to Feb 7th. With compute provided by OVHcould, we are going from 50 to 70+ languages, from 300M to 2B parameters models, and from toy evaluation datasets to real-world audio evaluation.
The goal of the event is to provide robust speech recognition systems in as many languages as possible to the community. We hope that especially low-resource languages will profit from this event.
The main components of the speech recognition event consist of:
- Meta AI’s state-of-the-art XLS-R model
- Common Voice newest datasets 7 & 8
- Language model boosted decoding with Wav2Vec2
- Real-world audio for evaluation
Participants have two weeks to build as many robust speech recognition systems in as many languages as they want. In general, speech recognition systems can consist of:
- Fine-tuned speech recognition checkpoints (e.g. XLS-R)
- Language model boosted decoders (e.g. pyctcdecode + n-gram)
- Pre- and post-processing modules, such as noise-canceling, spelling correction, …
During the event, you will have the opportunity to work on each of these components to build speech recognition systems in your favorite language!
Each speech recognition system will automatically be evaluated on real-world audio (if available for the language). After the fine-tuning week, the best-performing systems of each language will receive SWAG.
To participate, simply fill out this short google form. You will also need to create a Hugging Face Hub account here and join our discord here - when joining the event’s discord channel please make sure to click on the emoji under the first message to access all relevant information. OVHcloud kindly offered to provide a limited about of GPUs for participants if needed - if you would like to have access to a GPU, please join the discord for more information*. Here are a some in-detail videos on how to get started with setting up an OVHcloud account.
This fine-tuning week should be especially interesting to native speakers of low-resource languages. Your language skills will help you select the best training data, and possibly build the best existing speech recognition system in your language.
More in-detail information will be announced in the discord channel. We are looking forward to seeing you there!
- enjoy a bit of Hugging Face vibe
- learn how to build state-of-the-art speech recognition systems
- free compute to build a powerful fine-tuned model under your name on the Hub
- hugging face SWAG if you manage to have the best performing model in a language
- 100 GPU hours from OVHcloud if you manage to have the best performing model in a language
Anton, Omar, Nico & Patrick