Using Inference API with large audio files

Is there currently a way for us to use the Inference API for audio tasks with larger audio files? At the moment it appears limited to very short audio files or otherwise it returns a 413 payload too large error. The docs are silent on this question but our use case requires running inference on long audio files so if this isn’t supported then the API becomes a non-option for us.

Given that the docs specify the request must consist of a binary payload, I’m inclined to think inference on long audio files isn’t presently supported?

We have the API working for a short test file. But a 9 minute FLAC file is getting rejected by the server as too large (and our use-case is up to 60 minute files)

Thought someone here might know! Thanks.

Anyone have some insight on this? :eyes: :eyes:


I’m getting the same 413 message on all Speech Separation models. I tried both 8k and 16k sampling frequency. Hope to get some confirmation about this.

I am facing a similar probelm for image object recognition case

Hi there,

My name is Julien, from France.

I also try to post large audio files to a speech to text API hosted here, and get the 413 error message.

Did anyone find a solution or get an info from the tech team?

Best Regards,