How to use Inference API to perform speech recognition

Hi everyone.
I am working on a custom project that requires speech recognition.
I need to set up speech recognition with whisper, but my project is just for internal use of my family.

In a realistic scenario, I am not making more than 100 requests per-day, and each request would use the model for 1-10 minutes.

I set up an Inference Endpoint, thinking I would be billed only when my requests happen, but the model appears to always be in a “Running” state, so I am just billed 0.50 per hour since I created it which is waaaaay too much for me, completely unbearable.

So I tried to use the API:

import os

import requests
from dotenv import load_dotenv

load_dotenv()


API_TOKEN = os.environ.get('HF_TOKEN')

audio_file_path = "sample_data/10_min_meeting.wav"

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v3"


def query(filename):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return response


data = query(audio_file_path)
print(data)

But this only leads to a 413 ‘Payload Too Large’ Error.

What I need is just to run the model to perform the speech recognition remotely, since I do not have the hardware to run it on my own.
I am willing to pay for pro or for paid services if they let me achieve what I want in a feasible manner (0.5€ per hour is not feasible, I would like to stay under 50€/month)

1 Like

The default seems to be up to 2 MB. However, model authors and Spaces authors have the means to increase the limit…?
I also heard that the limit is relaxed when using Inference Endpoint.