How to finetune Whisper with language which is not supported in WhisperTokenizer

According to fine-Tune Whisper For Multilingual ASR with :hugs: Transformers, I can fine-tine the Whisper model with languages supported by WhisperTokenizer.

However, if we need to make it support new language (which is not supported by the tokenizer), how could I do that? Could you please point me to the document or example which I could follow?

1 Like

I’m also searching for the same answer.

Yes, i’m interested in this too. Particularly for very low resource languages like Wolof - do you need to train a BPE tokenizer given Wolof transcriptions, then pass the vocab.json and the merge file to the WhisperTokenizer?

did you by any chance figure out a solution? I’m in the same situation now for a different language, and I wonder if you have some advice for me.
Thanks in advance!

I’m wondering this too, any solution would be greatly appreciated!