Hugging face model not transcribing the entire length of the audio file

awesome8 · August 7, 2023, 9:40am

Brief: I’m unable to transcribe more than a few seconds of audio in a 5 minute audio file using hugging face open ai whisper(finetuned) model.
I’m facing issues with transcribing a Indian local language audio file using this(thennal/whisper-medium-ml · Hugging Face) hugging face model. It is only transcribing the first few seconds but I would like to get the entire file transcribed. I’m trying this on google collab.

What I have tried?

First of all the code which only shows the first few seconds

print(pipe("/content/audio.mp3"))

Other code that I tried are using model max_new_tokens which resulted in even shorter transcription and I wasn’t able to go above 500.
I tried DEFAULT_INPUT_AUDIO_MAX_DURATION = 300 which resulted in an error.
I tried asking bing about this but it was just blurting things.
I even tried to deploy it on a space but the result is same everywhere.

What I want?

I would be grateful if someone could write the code for me, which transcribes the entire audio file no matter what length it is.

Topic		Replies	Views
Support for ASR inference on longer audiofiles or on live transcription? 🤗Transformers	2	476	April 21, 2023
Speech recognition max length Beginners	2	115	October 29, 2024
Duration of audio sequence ingested by Whisper Inference Endpoints on the Hub	2	1681	January 17, 2023
How to use Whisper from huggingface for ASR DeepSpeed	0	540	June 21, 2023
HuggingFace Inference endpoint 504 error Inference Endpoints on the Hub	3	807	January 30, 2024

Hugging face model not transcribing the entire length of the audio file

What I have tried?

What I want?

Related topics