Brief: I’m unable to transcribe more than a few seconds of audio in a 5 minute audio file using hugging face open ai whisper(finetuned) model.
I’m facing issues with transcribing a Indian local language audio file using this(thennal/whisper-medium-ml · Hugging Face) hugging face model. It is only transcribing the first few seconds but I would like to get the entire file transcribed. I’m trying this on google collab.
What I have tried?
First of all the code which only shows the first few seconds
print(pipe("/content/audio.mp3"))
Other code that I tried are using model max_new_tokens which resulted in even shorter transcription and I wasn’t able to go above 500.
I tried DEFAULT_INPUT_AUDIO_MAX_DURATION = 300 which resulted in an error.
I tried asking bing about this but it was just blurting things.
I even tried to deploy it on a space but the result is same everywhere.
What I want?
I would be grateful if someone could write the code for me, which transcribes the entire audio file no matter what length it is.