Widget faster than inference Api?


I am getting some process time difference between widget (not cached) and inference Api between 1 to 2 seconds, when testing the same .wav file.

The model I tested : speechbrain/asr-wav2vec2-commonvoice-fr. I checked this model does not support accelerated inference.

I am wondering if this is a normal behavior?

To reproduce

Use the same wav file to test the model using

  1. The widget : speechbrain/asr-wav2vec2-commonvoice-fr · Hugging Face (not the cached file)
  2. The inference Api, code sample used :
import json
import requests
model_id = "speechbrain/asr-wav2vec2-commonvoice-fr"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/{}".format(model_id)
def query(filename, API_URL, headers):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))
data = query("example2.wav", API_URL, headers)

Expected behavior

Similar process time