[Open-to-the-community] Whisper fine-tuning event

Hey hey!

We are on a mission to democratise speech, increase the language coverage of current SoTA speech recognition and push the limits of what is possible. Come join us from December 5th - 19th for a community sprint powered by Lambda. Through this sprint, we’ll cover 70+ languages, 39M - 1550M parameters & evaluate our models on real-world evaluation datasets.

Register your interest via the Google form here.

What is the sprint about :question:

The goal of the sprint is to fine-tune Whisper in as many languages as possible and make them accessible to the community. We hope that especially low-resource languages will profit from this event.

The main components of the sprint consist of:

How does it work :gear:

Participants have two weeks to fine-tune Whisper checkpoints in as many languages as they want. The end goal is to build robust language-specific models that generalise well with real-world data. In general, the model repository on the Hugging Face hub should consist of:

  • Fine-tuned Whisper checkpoint (e.g. Whisper-large)
  • Evaluation script for your fine-tuned checkpoint
  • Hugging Face space to demo your fine-tuned model

The best part is that we’ll provide fine-tuning, evaluation and demo scripts for you to focus on the model performance.

During the event, you will have the opportunity to work on each of these components to build speech recognition systems in your favourite language!

Each Whisper checkpoint will automatically be evaluated on real-world audio (if available for the language). After the fine-tuning week, the best-performing systems of each language will receive :hugs: SWAG.

What do I need to do to participate :clipboard:

To participate, simply fill out this short google form . You will also need to create a Hugging Face Hub account here and join our discord here - Make sure to head over to #role-assignment and click on ML for Audio and Speech.

This fine-tuning week should be especially interesting to native speakers of low-resource languages. Your language skills will help you select the best training data, and possibly build the best existing speech recognition system in your language.

More details will be announced in the discord channel. We are looking forward to seeing you there!

What do I get :gift:

  • learn how to fine-tune state-of-the-art Whisper speech recognition checkpoints
  • free compute to build a powerful fine-tuned model under your name on the Hub
  • hugging face SWAG if you manage to build the best-performing model in a language
  • more GPU hours if you manage to have the best-performing model in a language

Open-sourcely yours,

Sanchit, VB & The HF Speech Team


I’m using it in Spanish and works really well. So maybe not needed a fine tunning for me… but

Is possible to get help to improve timestamps of whisper and work in that field?


1 Like

@raulkite you can share me the code of how to use whisper with Spanish audios, please.

No difference between English or Spanish. If your audio is Spanish your transcription will be Spanish.


I need a good timestamp er word accuracy with the transcription of whisper

I have seen that fine tunning whisper with hugging face :hugs: seems easy for other languages so I have thought that maybe to have better accuracy is a feasible task this way.

It could be “easy” to create a dataset with aligned long audios with tools like Gentle( GitHub - lowerquality/gentle: gentle forced aligner )
I have experience with this.

Also add some layers in the top of the model to train this new output seems possible.

Is there anyone working with this? I’m wrong?

If someone is working on this please ping me.


1 Like

English Coursework Help & Writing Services
English coursework writing is an essential part for Literature students. It is an essential language around all over the world, so the student who is doing English courses needs a lot of knowledge. English become a common language all over the world, so students take much interest to learn it.
visit us at-Archlite Assignmen Help

I would rather want to improve the translation from my language[telugu] to english. can we finetune the translation at this point? if yes where to specify it ? thanks