Trouble Invoking GPU-Accelerated Inference

Viren · April 12, 2022, 4:52pm

We recently signed up for an “Organization-Lab” account and are trying to use various blenderbot models available on hugging face with GPU-Accelerated Inference as referenced here:

For example, we are calling
https://api-inference.huggingface.co/models/facebook/blenderbot_small-90M
with our org bearer token
and input of

{
  "inputs": {
    "past_user_inputs": null,
    "generated_responses": null,
    "text": "Hi, how are you today?"
  },
  "options": {
    "use_gpu": true,
    "use_cache": false,
    "wait_for_model": false
  }
}

When we do this, we initially get 503 service unavailable errors, which we believed to mean the model was loading, similar to when using the default CPU-Accelerated Inference. However, after a few minutes, we then receive “400 Bad Request” (ProtocolError) with every call.

We believe our Organizations-Lab plan should allow for us to use this. We tried emailing api-enterprise@huggingface.co last Friday but have not had any response.

Any help to test these BlenderBot models using GPU-Accelerated Inference would be greatly appreciated.

Viren · April 12, 2022, 5:51pm

Follow-up: The support folks did end up getting back to us and had to fix something on their end. Hopefully this helps someone else as well!

Narsil · July 21, 2022, 7:35am

Very late to the party here.

Did you by any chance look at the 400 message ? Usually it will contain a helpful message.
400 means something is wrong in your query OR in the model configuration.
The API tries to send the most informative message whenever possible.

szdd520 · September 6, 2022, 12:44pm

Excuse me, have you solved the problem? I also encountered the same problem and returned an error of 503.

Viren · October 3, 2022, 3:34pm

Yes, unfortunately the message was simply “ProtocolError”

Viren · October 3, 2022, 3:35pm

Yes, the HuggingFace team had to fix something on their end. However, for a 503, that usually means the instance is spinning up for you and should resolve after a while. If it doesn’t then you may need to contact them or see if you have a more detailed message along with the 503.

Topic		Replies	Views
Cuda out of memory error when using Inference API 🤗Hub	0	946	August 11, 2022
'CUDA error: all CUDA-capable devices are busy or unavailable" when using 🤗Accelerate	0	1984	March 14, 2022
How does the API inference work on models such as Blenderbot? Models	4	925	May 14, 2022
Accelerated Inference API can't load a model on GPU Intermediate	13	2165	January 16, 2023
Dumb Question: Seeing that my inference API links not working Beginners	1	29	July 10, 2025

Trouble Invoking GPU-Accelerated Inference

Related topics