Issue of the prediction output of the Whisper model

mungsik · March 13, 2024, 4:04am

I conducted fine-tuning of the Whisper model using Korean.

1 (cer=2.0125)

pred: Right, actually, this side is more than 90% talent. The only thing that can be covered by effort is technology. But no matter how good the technology is, people who listen may hate it.
real: 그치 사실 이쪽은 재능이 구십 퍼 이상이지 노력으로 카바 할 수 있는 건 기술 쪽이고 근데 기술이 아무리 좋아도 듣는 사람들이 싫어할 수 있잖아

However, as in the example provided, there are instances where it interprets Korean as English for predictions. What could be the reason for this?

18 (cer=0.03529411764705882)

pred: 그런 게 있는데 내가 그거를 살려고 했었어 거기가 일본에 일본에서 만드는 거니까 일본에서만 나오는 그게 있거든 근데 그걸 살려고 봤는데 예쁘긴 예쁜데
real: 그런 게 있는데 내가 그거를 살려고 했었어 거기가 일본 아니 일본에서 만드는 거니까 일본에서만 나오는 그게 있거든 근데 그걸 살려고 봤는데 예쁘긴 예쁜데

I have confirmed that when it correctly predicts Korean as Korean, the CER (Character Error Rate) is low. The model and parameters I used for fine-tuning are as follows:

model : whisper-large-v2 + peft
parameter : batch_size = 128
learning_rate = 1e-3
warmup_steps = 50
gradient_accumulation_steps = 2

If additional information is needed, I will respond promptly.

Topic		Replies	Views
Whisper identified the wrong language 🤗Transformers	0	360	April 26, 2023
Korean finetuning on Whisper Beginners	1	1684	February 25, 2024
Fintune whisper model returns exclamation marks 🤗Transformers	1	589	August 7, 2023
Whisper fine-tuning on Librispeech makes WER worse 🤗Transformers	6	2512	June 26, 2023
Finetuned whisper model translating instead of transcribing 🤗Transformers	2	751	December 31, 2023

Issue of the prediction output of the Whisper model

Related topics