Hi I’m trying to use the Automatic Speech Recognition API but the docs are … light.
When I copy/paste the example code from the docs (below for convenience):
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/wav2vec2-base-960h"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("sample1.flac")
… I get …
{
"error": "Model facebook/wav2vec2-base-960h is currently loading",
"estimated_time": 20
}
It then says in the docs “no other parameters are currently allowed”. Does this mean I can’t ask it to use a GPU for instance?
So
- It’d be nice if the docs had sample code that worked out of the box. Developer UX is important.
- It’d be nice if the docs also had documentation on the response format. For instance even the error result:
estimated time: 20
is this minutes, days, centuries, nanoseconds? - A very common ASR feature is to have word-by-word timestamps for alignment use cases. Does this API support that or in any way harmonise the ASR engines underneath (SpeechBrain and another one)
I’m poised to shell out some big bucks for GPU-level support at HF but I need to see much more pro-level docs in this area.