How to finetune Whisper with language which is not supported in WhisperTokenizer

nathanhunt · May 9, 2023, 5:55am

According to fine-Tune Whisper For Multilingual ASR with Transformers, I can fine-tine the Whisper model with languages supported by WhisperTokenizer.

However, if we need to make it support new language (which is not supported by the tokenizer), how could I do that? Could you please point me to the document or example which I could follow?

bozden · May 20, 2023, 2:43pm

I’m also searching for the same answer.

nitinvw · September 12, 2023, 2:56pm

Yes, i’m interested in this too. Particularly for very low resource languages like Wolof - do you need to train a BPE tokenizer given Wolof transcriptions, then pass the vocab.json and the merge file to the WhisperTokenizer?

showgan · April 5, 2024, 3:14pm

did you by any chance figure out a solution? I’m in the same situation now for a different language, and I wonder if you have some advice for me.
Thanks in advance!

anzorq · May 18, 2024, 4:15am

Has anyone been able to fine tune Whisper in a new language?

Topic		Replies	Views
How to fine-tune whisper on unsupported language? Beginners	1	181	October 12, 2024
Fine Tuning Whisper on my own Dataset with a customized Tokenizer Beginners	16	12464	February 12, 2024
Open ai whisper fine tuning on unknown language Beginners	0	80	October 1, 2024
Fine-tuning Whisper for Translation Beginners	3	1470	April 17, 2025
Whisper model fine tuning Models	7	2369	June 8, 2024

How to finetune Whisper with language which is not supported in WhisperTokenizer

Related topics