Trouble Invoking GPU-Accelerated Inference

We recently signed up for an “Organization-Lab” account and are trying to use various blenderbot models available on hugging face with GPU-Accelerated Inference as referenced here:

For example, we are calling
https://api-inference.huggingface.co/models/facebook/blenderbot_small-90M
with our org bearer token
and input of

{
  "inputs": {
    "past_user_inputs": null,
    "generated_responses": null,
    "text": "Hi, how are you today?"
  },
  "options": {
    "use_gpu": true,
    "use_cache": false,
    "wait_for_model": false
  }
}

When we do this, we initially get 503 service unavailable errors, which we believed to mean the model was loading, similar to when using the default CPU-Accelerated Inference. However, after a few minutes, we then receive “400 Bad Request” (ProtocolError) with every call.

We believe our Organizations-Lab plan should allow for us to use this. We tried emailing api-enterprise@huggingface.co last Friday but have not had any response.

Any help to test these BlenderBot models using GPU-Accelerated Inference would be greatly appreciated.

Follow-up: The support folks did end up getting back to us and had to fix something on their end. Hopefully this helps someone else as well!

1 Like