Will transcript errors in original common_voice_16 Faris effect training Whisper?

I am interested in using Whisper to translate some Farsi video/audio for my wife. The first pass with large-v3 was not as good as we had hoped. Thus leading me to HF to learn to fine tune Whisper.

I did a quick run down through some of the common_voice_16_0 transcripts and immediately my wife pointed out that a lot of the segments within the originaltranscript_fa_train.tsv” file had spelling errors and some of the segments didn’t exactly match the audio. For example: ‘cannot’ ≠ ‘can not’ or ‘gooood byeeee’ ≠ ‘good bye’

This has made me leery of spending time (and money) fine tuning Whisper models on common voice Farsi if the input files are garbage.

Is this an issue: errors in the training files for fine tuning Whisper? I assumed the transcripts provided with common_voice were to be treated as 100% correct.

Thank you,
Pard