The output sequence length of Whisper ASR model

mrs2022 · January 12, 2023, 4:27pm

Hi
I am running some Whisper (tiny, base, and medium) ASR models on my audio files. It seems the Whisper max_length argument for outputting the sequence length is set to a small number (448). Changing this argument in the model or through configuration doesn’t have any effect, and I always get the same length. This is an issue for a large audio file. However, the original Whisper model from the OpenAI GitHub can handle larger audio files.

Could anyone help on this?

Thanks

MLLife · April 21, 2023, 7:43am

looking for solution to the same problem, @sanchit-gandhi ?

sanchit-gandhi · April 21, 2023, 6:57pm

Indeed, there is a solution using pipeline: see openai/whisper-large-v2 · Hugging Face and Google Colab

Topic		Replies	Views
OpenAi Whisper not giving full transcript using Interface Endpoint Models	0	481	November 17, 2022
Speech recognition max length Beginners	2	111	October 29, 2024
Whisper output for an empty audio Models	0	304	April 17, 2024
No output from ASR Pipeline using Whisper Beginners	1	1141	September 8, 2023
Duration of audio sequence ingested by Whisper Inference Endpoints on the Hub	2	1674	January 17, 2023

The output sequence length of Whisper ASR model

Related topics