The output sequence length of Whisper ASR model

Hi
I am running some Whisper (tiny, base, and medium) ASR models on my audio files. It seems the Whisper max_length argument for outputting the sequence length is set to a small number (448). Changing this argument in the model or through configuration doesn’t have any effect, and I always get the same length. This is an issue for a large audio file. However, the original Whisper model from the OpenAI GitHub can handle larger audio files.

Could anyone help on this?

Thanks

looking for solution to the same problem, @sanchit-gandhi ?

Indeed, there is a solution using pipeline: see openai/whisper-large-v2 · Hugging Face and Google Colab

1 Like