Hello,
I am getting some process time difference between widget (not cached) and inference Api between 1 to 2 seconds, when testing the same .wav file.
The model I tested : speechbrain/asr-wav2vec2-commonvoice-fr. I checked this model does not support accelerated inference.
I am wondering if this is a normal behavior?
To reproduce
Use the same wav file to test the model using
- The widget : speechbrain/asr-wav2vec2-commonvoice-fr · Hugging Face (not the cached file)
- The inference Api, code sample used :
%%time
import json
import requests
API_TOKEN = ""
model_id = "speechbrain/asr-wav2vec2-commonvoice-fr"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/{}".format(model_id)
def query(filename, API_URL, headers):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("example2.wav", API_URL, headers)
Expected behavior
Similar process time