Hi everyone.
I am working on a custom project that requires speech recognition.
I need to set up speech recognition with whisper, but my project is just for internal use of my family.
In a realistic scenario, I am not making more than 100 requests per-day, and each request would use the model for 1-10 minutes.
I set up an Inference Endpoint, thinking I would be billed only when my requests happen, but the model appears to always be in a “Running” state, so I am just billed 0.50 per hour since I created it which is waaaaay too much for me, completely unbearable.
So I tried to use the API:
import os
import requests
from dotenv import load_dotenv
load_dotenv()
API_TOKEN = os.environ.get('HF_TOKEN')
audio_file_path = "sample_data/10_min_meeting.wav"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v3"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return response
data = query(audio_file_path)
print(data)
But this only leads to a 413 ‘Payload Too Large’ Error.
What I need is just to run the model to perform the speech recognition remotely, since I do not have the hardware to run it on my own.
I am willing to pay for pro or for paid services if they let me achieve what I want in a feasible manner (0.5€ per hour is not feasible, I would like to stay under 50€/month)