We recently signed up for an “Organization-Lab” account and are trying to use various blenderbot models available on hugging face with GPU-Accelerated Inference as referenced here:
For example, we are calling
https://api-inference.huggingface.co/models/facebook/blenderbot_small-90M
with our org bearer token
and input of
{
"inputs": {
"past_user_inputs": null,
"generated_responses": null,
"text": "Hi, how are you today?"
},
"options": {
"use_gpu": true,
"use_cache": false,
"wait_for_model": false
}
}
When we do this, we initially get 503 service unavailable errors, which we believed to mean the model was loading, similar to when using the default CPU-Accelerated Inference. However, after a few minutes, we then receive “400 Bad Request” (ProtocolError) with every call.
We believe our Organizations-Lab plan should allow for us to use this. We tried emailing api-enterprise@huggingface.co last Friday but have not had any response.
Any help to test these BlenderBot models using GPU-Accelerated Inference would be greatly appreciated.