Whisper finetune for multilingual tasks

IN4 · February 21, 2024, 4:41am

Hello community.

I’m using whisper-large-v2 for audio transcription.
I have the task of transcribing audio in Kazakh and Russian.
The problem is that in Kazakhstan, people often use Russian words when speaking Kazakh. I tried to fine tune a model where the dataset consisted of 50% data in Kazakh and 50% in Russian, but the result did not please me since the model, for example, recognized audio as " Kazakh language" and did not transcribe Russian words, and vice versa.
Question:
Would it be acceptable to create a tokenizer of the “ru-kk” format to combine 2 languages into one? And will it be possible to fine tune the base model using such a tokenizer?

Topic		Replies	Views
Finetuned whisper model translating instead of transcribing 🤗Transformers	2	734	December 31, 2023
ASR on multilingual audio data (code-switching) Intermediate	0	181	January 10, 2024
Fine Tuning Whisper on my own Dataset with a customized Tokenizer Beginners	16	12410	February 12, 2024
Korean finetuning on Whisper Beginners	1	1606	February 25, 2024
How to finetune Whisper with language which is not supported in WhisperTokenizer Beginners	4	831	May 18, 2024

Whisper finetune for multilingual tasks

Related topics