Hey hey!
We are on a mission to democratise speech, increase the language coverage of current SoTA speech recognition and push the limits of what is possible. Come join us from December 5th - 19th for a community sprint powered by Lambda. Through this sprint, we’ll cover 70+ languages, 39M - 1550M parameters & evaluate our models on real-world evaluation datasets.
Register your interest via the Google form here.
What is the sprint about
The goal of the sprint is to fine-tune Whisper in as many languages as possible and make them accessible to the community. We hope that especially low-resource languages will profit from this event.
The main components of the sprint consist of:
- Open AI’s state-of-the-art Whisper model
- Public datasets like Common Voice 11, VoxPopuli, CoVoST2 and more
- Real-world audio for evaluation
How does it work
Participants have two weeks to fine-tune Whisper checkpoints in as many languages as they want. The end goal is to build robust language-specific models that generalise well with real-world data. In general, the model repository on the Hugging Face hub should consist of:
- Fine-tuned Whisper checkpoint (e.g. Whisper-large)
- Evaluation script for your fine-tuned checkpoint
- Hugging Face space to demo your fine-tuned model
The best part is that we’ll provide fine-tuning, evaluation and demo scripts for you to focus on the model performance.
During the event, you will have the opportunity to work on each of these components to build speech recognition systems in your favourite language!
Each Whisper checkpoint will automatically be evaluated on real-world audio (if available for the language). After the fine-tuning week, the best-performing systems of each language will receive SWAG.
What do I need to do to participate
To participate, simply fill out this short google form . You will also need to create a Hugging Face Hub account here and join our discord here - Make sure to head over to #role-assignment and click on ML for Audio and Speech.
This fine-tuning week should be especially interesting to native speakers of low-resource languages. Your language skills will help you select the best training data, and possibly build the best existing speech recognition system in your language.
More details will be announced in the discord channel. We are looking forward to seeing you there!
What do I get
- learn how to fine-tune state-of-the-art Whisper speech recognition checkpoints
- free compute to build a powerful fine-tuned model under your name on the Hub
- hugging face SWAG if you manage to build the best-performing model in a language
- more GPU hours if you manage to have the best-performing model in a language
Open-sourcely yours,
Sanchit, VB & The HF Speech Team