Hi all ,
I am getting the following error when I call the list_deployed_models(
) on the InferenceClient
.
framework = "text-generation-inference" # "text-to-speech",
deployed_models = client.list_deployed_models([framework])
print(deployed_models)
Error:
BadRequestError: (Request ID: Root=1-67c47933-02bf30da3e80cb5307dc9184;4dbda49a-b0b7-48bc-b0f7-5f7e09fc28d6)
Bad request:
Not allowed to request framework/text-generation-inference for provider hf-inference
Any help is appreciated
Regards, Raj
3 Likes
I think the URL for the reference is no longer valid. Perhaps the library hasn’t been updated with the new URL?
There doesn’t seem to be an issue, but it might be fixed eventually.
opened 03:04PM - 11 Jul 23 UTC
closed 09:37AM - 08 Sep 23 UTC
good first issue
The new `InferenceClient` allows to run inference on the cloud. For each impleme… nted task, new users that don't know which model to use can use the recommended model on the (free) Inference API (implemented in https://github.com/huggingface/huggingface_hub/pull/1510). Since the recommended model might not be suitable for all use cases, it is also possible to specify any model as input to the `InferenceClient`. While most models on the HuggingFace Hub is compatible with Inference API, waiting for the models to load can take a lot of time. Also, for large models the Inference API just cannot load them.
To ease user experience, it would be nice to have a utility `list_deployed_models` that would return a list of already deployed models for the user to try.
```py
class InferenceClient:
...
def list_deployed_models(self, *, token: Optional[str] = None) -> Dict[str, List[str]]:
...
return {"audio-to-audio": ["microsoft/speecht5_vc", ...], "text-generation": [...], ...}
```
This helper should help with discoverability.
---
Implementation-wise, this list of deployed models can be retrieved using the `"https://api-inference.huggingface.co/framework/{framework_name}"` where "framework_name" is a framework compatible with the Inference API. For example:
- https://api-inference.huggingface.co/framework/transformers
- https://api-inference.huggingface.co/framework/diffusers
- https://api-inference.huggingface.co/framework/text-generation-inference
- https://api-inference.huggingface.co/framework/sentence-transformers
Around ~30 frameworks are available but I think we should first focus on the main frameworks. Each url returns a list of dictionaries. Each item is a deployed model: `{"compute_type":"cpu","model_id":"microsoft/speecht5_vc","sha":"c418ba2144598f973d0fddc9fd5909a3af83de3d","task":"audio-to-audio"}`. What is interesting for us is `"model_id"` and `"task"`. What a `list_deployed_models` method would do is to loop through all main framework, loop though all items and build an output dictionary with 'task' as key and a list of model ids as value.
Example:
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.deployed_models()["audio-to-audio"]
["microsoft/speecht5_vc", ...]
```
Happy to discuss the API design and update it if needed :) We still have to figure out what to do if several "sha" are deployed for a single model.
if framework == "sentence-transformers":
# Model running with the `sentence-transformers` framework can work with both tasks even if not
# branded as such in the API response
models_by_task.setdefault("feature-extraction", []).append(model["model_id"])
models_by_task.setdefault("sentence-similarity", []).append(model["model_id"])
else:
models_by_task.setdefault(model["task"], []).append(model["model_id"])
for framework in frameworks:
response = get_session().get(
f"{constants.INFERENCE_ENDPOINT}/framework/{framework}", headers=build_hf_headers(token=self.token)
)
hf_raise_for_status(response)
_unpack_response(framework, response.json())
# Sort alphabetically for discoverability and return
for task, models in models_by_task.items():
models_by_task[task] = sorted(set(models), key=lambda x: x.lower())
return models_by_task
def get_endpoint_info(self, *, model: Optional[str] = None) -> Dict[str, Any]:
1 Like
Hey @acloudfan ,
I ran the code through InferenceClient
and was able to replicate the same error you encountered. However, the error stack also includes an additional warning indicating that listing models via InferenceClient
has been deprecated. Going forward, only warm models can be referenced, and this must be done using HfApi
, not InferenceClient
:
Warning:
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'list_deployed_models' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.33.0'. HF Inference API is getting revamped and will only support warm models in the future (no cold start allowed). Use `HfApi.list_models(..., inference_provider='...')` to list warm models per provider.
warnings.warn(warning_message, FutureWarning)
As the warning suggests, modifying the implementation to use HfApi
instead of InferenceClient
allows model listing to work correctly. The following code successfully retrieves and lists the models:
import os
os.environ["HF_TOKEN"] = "hf_yourhftoken"
from huggingface_hub import HfApi
api = HfApi()
models = api.list_models(filter="text-generation-inference", limit=1000)
for model in models:
print(model)
You can use the code above to list models without any issues. Feel free to modify or remove the limit argument if you need to retrieve all models for a specific task.
For reference, you can check the list of available tasks in this message from @Wauplin :
opened 03:04PM - 11 Jul 23 UTC
closed 09:37AM - 08 Sep 23 UTC
good first issue
The new `InferenceClient` allows to run inference on the cloud. For each impleme… nted task, new users that don't know which model to use can use the recommended model on the (free) Inference API (implemented in https://github.com/huggingface/huggingface_hub/pull/1510). Since the recommended model might not be suitable for all use cases, it is also possible to specify any model as input to the `InferenceClient`. While most models on the HuggingFace Hub is compatible with Inference API, waiting for the models to load can take a lot of time. Also, for large models the Inference API just cannot load them.
To ease user experience, it would be nice to have a utility `list_deployed_models` that would return a list of already deployed models for the user to try.
```py
class InferenceClient:
...
def list_deployed_models(self, *, token: Optional[str] = None) -> Dict[str, List[str]]:
...
return {"audio-to-audio": ["microsoft/speecht5_vc", ...], "text-generation": [...], ...}
```
This helper should help with discoverability.
---
Implementation-wise, this list of deployed models can be retrieved using the `"https://api-inference.huggingface.co/framework/{framework_name}"` where "framework_name" is a framework compatible with the Inference API. For example:
- https://api-inference.huggingface.co/framework/transformers
- https://api-inference.huggingface.co/framework/diffusers
- https://api-inference.huggingface.co/framework/text-generation-inference
- https://api-inference.huggingface.co/framework/sentence-transformers
Around ~30 frameworks are available but I think we should first focus on the main frameworks. Each url returns a list of dictionaries. Each item is a deployed model: `{"compute_type":"cpu","model_id":"microsoft/speecht5_vc","sha":"c418ba2144598f973d0fddc9fd5909a3af83de3d","task":"audio-to-audio"}`. What is interesting for us is `"model_id"` and `"task"`. What a `list_deployed_models` method would do is to loop through all main framework, loop though all items and build an output dictionary with 'task' as key and a list of model ids as value.
Example:
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.deployed_models()["audio-to-audio"]
["microsoft/speecht5_vc", ...]
```
Happy to discuss the API design and update it if needed :) We still have to figure out what to do if several "sha" are deployed for a single model.
Hope this helps! Let me know if it resolves your issue.
1 Like