Whisper model fine tuning

There is an excellent blog on fine-tuning whisper model on hindi (Devanagari) dataset by @sanchit-gandhi .
I need to fine-tune whisper model on the hinglish dataset (mix of english and hindi). The supervised data contains hinglish annotations (not Devanagari).
Is there a way to fine-tune whisper model on this dataset? Can I replace the model tokenizer with my custom tokenizer to proceed with fine-tuning process?
Please suggest a way to proceed on the task.

Hey @Ankit-Kumar-Saini!

To clarify, does your dataset contain Hindi characters and Roman ones (i.e. the letters a-z)? Just Hindi ones? Or just Roman ones (a-z)?

Likelihood is we won’t need a new tokenizer - the Whisper tokenizer has the Hindi alphabet and Roman alphabet among others. It’s just how we initialise the tokenizer that changes.

FYI this is the update blog post: Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Dataset contains only roman ones (i.e. the letters a-z).
For example: “aapka age kitna hai”.
How should I tokenize this text?

Okay! I probably wound’t build a new tokeniser - I would first try using the existing Whisper one as it contains the entire Roman alphabet and more words in word-piece form.

You need to change this part of the script:

I would first try omitting the language and task code:

processor = WhisperProcessor.from_pretrained("openai/whisper-small")

If you train for long enough, the model should learn the correct output alphabet.