Hello.
I am performing fine tuning in Korean by referring to the code.
Fine-Tune Whisper For Multilingual ASR with Transformers
processor = WhisperProcessor.from_pretrained(“openai/whisper-small”, language=“Korean”, task=“transcribe”)
data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)
tokenizer = WhisperTokenizer.from_pretrained(“openai/whisper-small”, language=“Korean”, task=“transcribe”)
feature_extractor = WhisperFeatureExtractor.from_pretrained(“openai/whisper-small”)
model = WhisperForConditionalGeneration.from_pretrained(“openai/whisper-small”)
The Korean pre trained model works well.
By loading the pre-trained model in this way,
Even if you use it and learn a little with a small dataset, it shows amazing performance.
But there is something I would like to change.
-
. and ? are added appropriately depending on the voice, but I would like to remove this.
=> Should I post-process the string inferred by the model?
I initially tried to delete it from the tokenizer’s dictionary of words, but failed. -
It seems that they know the numbers that can be written in Korean and are writing them as numbers rather than Korean.
=> The result I want is 사백 만
The model inferred was 4백 만
Any ideas on this would be appreciated.