Modifying Whisper using Domain Specific Attention

ETLWeather · June 15, 2025, 7:08am

I wanted to transcribe dictation where the person say words like “paragraph”, “comma”, “period”. The original models did so, so with that, more often than not, mangling the dictation commands.

I created a dataset of about 1-2,000 examples (I took recordings and split them with Sillero VAD in chunks of 30s or less) and labeled them manually using Prodigy. Then I used the dataset to fine-tune the small.en model and got exceptional results.

This is similar to your case - words that the model was not originally particularly good at transcribing.

Topic		Replies	Views
Adding custom vocabularies on Whisper Beginners	7	27057	March 25, 2025
Performing Whisper's "transcribe" with Transformer pipelines Beginners	2	2728	December 19, 2023
Fine Tuning Whisper on my own Dataset with a customized Tokenizer Beginners	16	12599	February 12, 2024
Finetuning Whisper with prompts 🤗Transformers	3	4164	January 16, 2024
Is prompt properly implemented in the whisper model? 🤗Transformers	1	1602	September 19, 2024

Modifying Whisper using Domain Specific Attention

Related topics