Trouble Invoking GPU-Accelerated Inference

We recently signed up for an “Organization-Lab” account and are trying to use various blenderbot models available on hugging face with GPU-Accelerated Inference as referenced here:

For example, we are calling
https://api-inference.huggingface.co/models/facebook/blenderbot_small-90M
with our org bearer token
and input of

{
  "inputs": {
    "past_user_inputs": null,
    "generated_responses": null,
    "text": "Hi, how are you today?"
  },
  "options": {
    "use_gpu": true,
    "use_cache": false,
    "wait_for_model": false
  }
}

When we do this, we initially get 503 service unavailable errors, which we believed to mean the model was loading, similar to when using the default CPU-Accelerated Inference. However, after a few minutes, we then receive “400 Bad Request” (ProtocolError) with every call.

We believe our Organizations-Lab plan should allow for us to use this. We tried emailing api-enterprise@huggingface.co last Friday but have not had any response.

Any help to test these BlenderBot models using GPU-Accelerated Inference would be greatly appreciated.

Follow-up: The support folks did end up getting back to us and had to fix something on their end. Hopefully this helps someone else as well!

1 Like

Very late to the party here.

Did you by any chance look at the 400 message ? Usually it will contain a helpful message.
400 means something is wrong in your query OR in the model configuration.
The API tries to send the most informative message whenever possible.

Excuse me, have you solved the problem? I also encountered the same problem and returned an error of 503.

Yes, unfortunately the message was simply “ProtocolError”

Yes, the HuggingFace team had to fix something on their end. However, for a 503, that usually means the instance is spinning up for you and should resolve after a while. If it doesn’t then you may need to contact them or see if you have a more detailed message along with the 503.