ASR on inference endpoints

Hi there,

I am a bit lost on which path to continue on the following task:
I want to use whisper large v2 on quite a bit of audio files. Current working options are simply the openai api (costly) and local inference with pipelines (slow). Another hopefully simple option would be using inference endpoints instead, based on the openai/whisper-large-v2 repo.

However, it seems like I cannot pass arguments there, like chunk_size, language etc. (at least this is what I understand from the documentation). So the questions are

  • Is it correct, that I actually cannot pass arguments if simply setting up the inference endpoint with openai/whisper-large-v2 repo? If wrong, how to do it?
  • If correct: What is an alternative? I find custom handlers as a possible solution, but I am a bit lost on what the logic then is. I would somehow have to combine the repositories philschmid/openai-whisper-endpoint and openai/whisper-large-v2. Doing so does not seem straightforward to me

Any suggestions on this?

Thank you so much

Did you ever figure this out? When I deploy on inference endpoints for Whisper, it never works. I’ve not had this issue before with other models.