Modifying Whisper using Domain Specific Attention

shull · April 5, 2024, 7:02pm

Hi there, I am currently using open AI’s Whisper model to perform speech to text transcription. Much of my input data comes from language and terms that are specific to a particular industry/domain - biology. As such, when I have audio that contains biology terms, the model does not always transcribe those correctly, sometimes not even transcribing them and skipping over them. The obvious solution to this seems to be to finetune Whisper on my dataset and improve it that way: Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers, or potentially using a new tokenizer (then I would lose Whisper’s pretrained weights). But I do not have access to a large training corpus of both the necessary biological terms and their audio files.

So instead, I was thinking of exploring a domain specific attention initialization approach where I would initialize the attention with prior information about medical terminology? Since the source of my problem/task is in the decoder part of Whisper’s transformer, I was thinking of maybe fixing the attention part of its architecture. Please correct me if I am wrong and if there is another way. But if this is the right way to go about this problem, I am wondering how I would do this.

@sanchit-gandhi do you maybe have input on this?

sgonzalezsilot · June 17, 2024, 9:41am

Hi!

Did u find any solution? I’m having the same trouble…

ETLWeather · June 15, 2025, 7:08am

I wanted to transcribe dictation where the person say words like “paragraph”, “comma”, “period”. The original models did so, so with that, more often than not, mangling the dictation commands.

I created a dataset of about 1-2,000 examples (I took recordings and split them with Sillero VAD in chunks of 30s or less) and labeled them manually using Prodigy. Then I used the dataset to fine-tune the small.en model and got exceptional results.

This is similar to your case - words that the model was not originally particularly good at transcribing.

Topic		Replies	Views
Adding custom vocabularies on Whisper Beginners	7	26277	March 25, 2025
Whisper model fine tuning Models	7	2364	June 8, 2024
Model Suggestion on Text correction Beginners	0	766	April 2, 2021
Finetuned whisper model translating instead of transcribing 🤗Transformers	2	736	December 31, 2023
Whisper: Summarization Task or ASR + Summarization Trained End to End Models	1	538	December 19, 2023

Modifying Whisper using Domain Specific Attention

Related topics