Huggingface_hub.client giving error on list_deployed_models()

acloudfan · March 2, 2025, 3:38pm

Hi all ,
I am getting the following error when I call the list_deployed_models() on the InferenceClient.

framework = "text-generation-inference"  # "text-to-speech", 
deployed_models = client.list_deployed_models([framework])
print(deployed_models)

Error:

BadRequestError: (Request ID: Root=1-67c47933-02bf30da3e80cb5307dc9184;4dbda49a-b0b7-48bc-b0f7-5f7e09fc28d6)

Bad request:
Not allowed to request framework/text-generation-inference for provider hf-inference

Any help is appreciated

Regards, Raj

John6666 · March 2, 2025, 8:38pm

I think the URL for the reference is no longer valid. Perhaps the library hasn’t been updated with the new URL?

There doesn’t seem to be an issue, but it might be fixed eventually.

github.com/huggingface/huggingface_hub

Add `list_deployed_models` to list models deployed on InferenceAPI

opened 03:04PM - 11 Jul 23 UTC

closed 09:37AM - 08 Sep 23 UTC

Wauplin

good first issue

The new `InferenceClient` allows to run inference on the cloud. For each impleme…nted task, new users that don't know which model to use can use the recommended model on the (free) Inference API (implemented in https://github.com/huggingface/huggingface_hub/pull/1510). Since the recommended model might not be suitable for all use cases, it is also possible to specify any model as input to the `InferenceClient`. While most models on the HuggingFace Hub is compatible with Inference API, waiting for the models to load can take a lot of time. Also, for large models the Inference API just cannot load them. To ease user experience, it would be nice to have a utility `list_deployed_models` that would return a list of already deployed models for the user to try. ```py class InferenceClient: ... def list_deployed_models(self, *, token: Optional[str] = None) -> Dict[str, List[str]]: ... return {"audio-to-audio": ["microsoft/speecht5_vc", ...], "text-generation": [...], ...} ``` This helper should help with discoverability. --- Implementation-wise, this list of deployed models can be retrieved using the `"https://api-inference.huggingface.co/framework/{framework_name}"` where "framework_name" is a framework compatible with the Inference API. For example: - https://api-inference.huggingface.co/framework/transformers - https://api-inference.huggingface.co/framework/diffusers - https://api-inference.huggingface.co/framework/text-generation-inference - https://api-inference.huggingface.co/framework/sentence-transformers Around ~30 frameworks are available but I think we should first focus on the main frameworks. Each url returns a list of dictionaries. Each item is a deployed model: `{"compute_type":"cpu","model_id":"microsoft/speecht5_vc","sha":"c418ba2144598f973d0fddc9fd5909a3af83de3d","task":"audio-to-audio"}`. What is interesting for us is `"model_id"` and `"task"`. What a `list_deployed_models` method would do is to loop through all main framework, loop though all items and build an output dictionary with 'task' as key and a list of model ids as value. Example: ```py >>> from huggingface_hub import InferenceClient >>> client = InferenceClient() >>> client.deployed_models()["audio-to-audio"] ["microsoft/speecht5_vc", ...] ``` Happy to discuss the API design and update it if needed :) We still have to figure out what to do if several "sha" are deployed for a single model.

github.com/huggingface/huggingface_hub

src/huggingface_hub/inference/_client.py

v0.29.1


      
                      if framework == "sentence-transformers":
                          # Model running with the `sentence-transformers` framework can work with both tasks even if not
                          # branded as such in the API response
                          models_by_task.setdefault("feature-extraction", []).append(model["model_id"])
                          models_by_task.setdefault("sentence-similarity", []).append(model["model_id"])
                      else:
                          models_by_task.setdefault(model["task"], []).append(model["model_id"])
          
              for framework in frameworks:
                  response = get_session().get(
                      f"{constants.INFERENCE_ENDPOINT}/framework/{framework}", headers=build_hf_headers(token=self.token)
                  )
                  hf_raise_for_status(response)
                  _unpack_response(framework, response.json())
          
              # Sort alphabetically for discoverability and return
              for task, models in models_by_task.items():
                  models_by_task[task] = sorted(set(models), key=lambda x: x.lower())
              return models_by_task
          
          def get_endpoint_info(self, *, model: Optional[str] = None) -> Dict[str, Any]:

nnilayy · March 3, 2025, 12:00am

Hey @acloudfan,
I ran the code through InferenceClient and was able to replicate the same error you encountered. However, the error stack also includes an additional warning indicating that listing models via InferenceClient has been deprecated. Going forward, only warm models can be referenced, and this must be done using HfApi, not InferenceClient:
Warning:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'list_deployed_models' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.33.0'. HF Inference API is getting revamped and will only support warm models in the future (no cold start allowed). Use `HfApi.list_models(..., inference_provider='...')` to list warm models per provider.  
  warnings.warn(warning_message, FutureWarning)

As the warning suggests, modifying the implementation to use HfApi instead of InferenceClient allows model listing to work correctly. The following code successfully retrieves and lists the models:

import os
os.environ["HF_TOKEN"] = "hf_yourhftoken"

from huggingface_hub import HfApi
api = HfApi()
models = api.list_models(filter="text-generation-inference", limit=1000)
for model in models:
    print(model)

You can use the code above to list models without any issues. Feel free to modify or remove the limit argument if you need to retrieve all models for a specific task.

For reference, you can check the list of available tasks in this message from @Wauplin:

github.com/huggingface/huggingface_hub

Add `list_deployed_models` to list models deployed on InferenceAPI

opened 03:04PM - 11 Jul 23 UTC

closed 09:37AM - 08 Sep 23 UTC

Wauplin

good first issue

The new `InferenceClient` allows to run inference on the cloud. For each impleme…nted task, new users that don't know which model to use can use the recommended model on the (free) Inference API (implemented in https://github.com/huggingface/huggingface_hub/pull/1510). Since the recommended model might not be suitable for all use cases, it is also possible to specify any model as input to the `InferenceClient`. While most models on the HuggingFace Hub is compatible with Inference API, waiting for the models to load can take a lot of time. Also, for large models the Inference API just cannot load them. To ease user experience, it would be nice to have a utility `list_deployed_models` that would return a list of already deployed models for the user to try. ```py class InferenceClient: ... def list_deployed_models(self, *, token: Optional[str] = None) -> Dict[str, List[str]]: ... return {"audio-to-audio": ["microsoft/speecht5_vc", ...], "text-generation": [...], ...} ``` This helper should help with discoverability. --- Implementation-wise, this list of deployed models can be retrieved using the `"https://api-inference.huggingface.co/framework/{framework_name}"` where "framework_name" is a framework compatible with the Inference API. For example: - https://api-inference.huggingface.co/framework/transformers - https://api-inference.huggingface.co/framework/diffusers - https://api-inference.huggingface.co/framework/text-generation-inference - https://api-inference.huggingface.co/framework/sentence-transformers Around ~30 frameworks are available but I think we should first focus on the main frameworks. Each url returns a list of dictionaries. Each item is a deployed model: `{"compute_type":"cpu","model_id":"microsoft/speecht5_vc","sha":"c418ba2144598f973d0fddc9fd5909a3af83de3d","task":"audio-to-audio"}`. What is interesting for us is `"model_id"` and `"task"`. What a `list_deployed_models` method would do is to loop through all main framework, loop though all items and build an output dictionary with 'task' as key and a list of model ids as value. Example: ```py >>> from huggingface_hub import InferenceClient >>> client = InferenceClient() >>> client.deployed_models()["audio-to-audio"] ["microsoft/speecht5_vc", ...] ``` Happy to discuss the API design and update it if needed :) We still have to figure out what to do if several "sha" are deployed for a single model.

Hope this helps! Let me know if it resolves your issue.

Topic		Replies	Views
List models accessible via InferenceClient? Inference Endpoints on the Hub	1	125	April 9, 2025
What models are available in the Inference API? Beginners	3	1853	July 21, 2024
List_models() using full-text search of hf-internal-testing 🤗Hub	0	103	June 25, 2024
Models with inference_client Beginners	1	70	January 28, 2025
List model names filtered by pipeline tag 🤗Hub	2	1686	January 20, 2022

Huggingface_hub.client giving error on list_deployed_models()

Related topics