Deploy whisper by passing last transcribed sentences to decoder's past_key values

I’m working on using whisper model for real time live transcription. I have to deploy audio chucks on model to have a sense of real time transcription, i.e. every 1 second I deploy audio of last 5 seconds.
For such a task, I have to merge transcribed text since the audio has 4 seconds of overlapping with previous samples. Thus, the output transcription at each 1-second time-step has some words in common with previous ones.
There are several solutions for merging such transcribed texts such as using a language model or dynamic programming.
But I have an Idea to use whisper model itself for merging text while it has a language model.
I want to pass the previous transcriptions to its decoder’s past key and do the generation based on the initial texts generated at previous time-steps.
Do you have any idea that how can I implement such idea?