Will transcript errors in original common_voice_16 Faris effect training Whisper?

Pardner · January 17, 2024, 11:04pm

I am interested in using Whisper to translate some Farsi video/audio for my wife. The first pass with large-v3 was not as good as we had hoped. Thus leading me to HF to learn to fine tune Whisper.

I did a quick run down through some of the common_voice_16_0 transcripts and immediately my wife pointed out that a lot of the segments within the original “transcript_fa_train.tsv” file had spelling errors and some of the segments didn’t exactly match the audio. For example: ‘cannot’ ≠ ‘can not’ or ‘gooood byeeee’ ≠ ‘good bye’

This has made me leery of spending time (and money) fine tuning Whisper models on common voice Farsi if the input files are garbage.

Is this an issue: errors in the training files for fine tuning Whisper? I assumed the transcripts provided with common_voice were to be treated as 100% correct.

Thank you,
Pard

Topic		Replies	Views
Help needed with issues while trying fine-tune Whisper Beginners	2	1400	April 19, 2024
How don't destroy the general learning of whisper throught fine tune Beginners	0	58	December 17, 2024
Tiny whisper finetuning for french speech recognition Models	3	444	September 17, 2024
How to finetune whisper model 🤗Transformers	0	565	May 7, 2023
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3009	January 4, 2022

Will transcript errors in original common_voice_16 Faris effect training Whisper?

Related topics