Whisper model fine tuning

Ankit-Kumar-Saini · November 14, 2022, 12:21pm

There is an excellent blog on fine-tuning whisper model on hindi (Devanagari) dataset by @sanchit-gandhi .
I need to fine-tune whisper model on the hinglish dataset (mix of english and hindi). The supervised data contains hinglish annotations (not Devanagari).
Is there a way to fine-tune whisper model on this dataset? Can I replace the model tokenizer with my custom tokenizer to proceed with fine-tuning process?
Please suggest a way to proceed on the task.

sanchit-gandhi · November 14, 2022, 1:48pm

Hey @Ankit-Kumar-Saini!

To clarify, does your dataset contain Hindi characters and Roman ones (i.e. the letters a-z)? Just Hindi ones? Or just Roman ones (a-z)?

Likelihood is we won’t need a new tokenizer - the Whisper tokenizer has the Hindi alphabet and Roman alphabet among others. It’s just how we initialise the tokenizer that changes.

FYI this is the update blog post: Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Ankit-Kumar-Saini · November 14, 2022, 3:13pm

Dataset contains only roman ones (i.e. the letters a-z).
For example: “aapka age kitna hai”.
How should I tokenize this text?

sanchit-gandhi · November 14, 2022, 5:34pm

Okay! I probably wound’t build a new tokeniser - I would first try using the existing Whisper one as it contains the entire Roman alphabet and more words in word-piece form.

You need to change this part of the script:

I would first try omitting the language and task code:

processor = WhisperProcessor.from_pretrained("openai/whisper-small")

If you train for long enough, the model should learn the correct output alphabet.

Deveshp · December 17, 2022, 6:51am

Hi @sanchit-gandhi, Thanks for your blog post, it was excellent but i am facing issue in fine tuning the whisper model on my data and need your support. I have audio files and corresponding meta-data csv file with two columns, one is wav_filename which contains the path of audios and corresponding text in second column as transcript and i am getting below issue while running the code: FileNotFoundError: [Errno 2] No such file or directory: ‘wav_filename’. It will be great help. Thanks.

sanchit-gandhi · December 19, 2022, 4:01pm

Hey @Deveshp! Welcome to the HF forum and thanks for asking a great question! Would you mind opening a new forum post for your question and we can shift the discussion there?

The reason being that it makes searching for issues much easier if we keep each forum post related to one question. This way, if someone has the same issue as you later down the line they can sift through all the previous issues and hopefully find our discussion! Thanks!

crossdelenna · February 23, 2024, 9:50pm

Hey Ankit, I know I am replying to a two-year-old post but this is exactly what I needed and Google gave me this link. Have you found the solution, what was the WER and how long did you train? Most research and ChatGPT suggest it’s better to train on Devnagri but the base model of even Whisper 3 for multi-lingual is so bad that I am not optimistic anymore. Preparing the data going to consume so much time that I need to be sure whether it will be worth it LOL.

shivamtiwari2112 · June 8, 2024, 4:58am

Do I also need to remove language, and task attribute for tokenizer , like you said to remove from the processor?

Topic		Replies	Views
Fine Tuning Whisper on my own Dataset with a customized Tokenizer Beginners	16	12405	February 12, 2024
Whisper Fine Tuned model cannot used on WhisperX Models	4	1064	October 6, 2024
Finetuned whisper model translating instead of transcribing 🤗Transformers	2	734	December 31, 2023
Korean finetuning on Whisper Beginners	1	1605	February 25, 2024
Has Anyone Successfully Fine-Tuned Whisper for a Local Language for better accuracy Beginners	5	197	May 27, 2025

Whisper model fine tuning

Related topics