I am performing fine tuning in Korean by referring to the code.
Fine-Tune Whisper For Multilingual ASR with Transformers
processor = WhisperProcessor.from_pretrained(“openai/whisper-small”, language=“Korean”, task=“transcribe”)
data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)
tokenizer = WhisperTokenizer.from_pretrained(“openai/whisper-small”, language=“Korean”, task=“transcribe”)
feature_extractor = WhisperFeatureExtractor.from_pretrained(“openai/whisper-small”)
model = WhisperForConditionalGeneration.from_pretrained(“openai/whisper-small”)
The Korean pre trained model works well.
By loading the pre-trained model in this way,
Even if you use it and learn a little with a small dataset, it shows amazing performance.
But there is something I would like to change.
. and ? are added appropriately depending on the voice, but I would like to remove this.
=> Should I post-process the string inferred by the model?
I initially tried to delete it from the tokenizer’s dictionary of words, but failed.
It seems that they know the numbers that can be written in Korean and are writing them as numbers rather than Korean.
=> The result I want is 사백 만
The model inferred was 4백 만
Any ideas on this would be appreciated.