Deploy whisper by passing last transcribed sentences to decoder's past_key values

Hannan · March 20, 2023, 4:23pm

I’m working on using whisper model for real time live transcription. I have to deploy audio chucks on model to have a sense of real time transcription, i.e. every 1 second I deploy audio of last 5 seconds.
For such a task, I have to merge transcribed text since the audio has 4 seconds of overlapping with previous samples. Thus, the output transcription at each 1-second time-step has some words in common with previous ones.
There are several solutions for merging such transcribed texts such as using a language model or dynamic programming.
But I have an Idea to use whisper model itself for merging text while it has a language model.
I want to pass the previous transcriptions to its decoder’s past key and do the generation based on the initial texts generated at previous time-steps.
Do you have any idea that how can I implement such idea?

Topic		Replies	Views
Using past_key_values to provide context to decoder results in same output 🤗Transformers	0	697	December 23, 2023
Using Whisper's text-timing functionality on a pre-existing transcript Models	0	185	July 27, 2023
Deploying Whisper Based Live Transcription for 1000 Concurrent users Intermediate	0	348	June 1, 2024
Don't know where to start. Please help manipulating transcribed audio Beginners	0	203	March 11, 2024
Whisper fine-tuning and retaining timestamp decoding Models	5	1321	December 12, 2024

Deploy whisper by passing last transcribed sentences to decoder's past_key values

Related topics