Provider="auto" always routes DeepSeek model to Novita even when Novita is disabled — ignoring org settings

I’m experiencing an issue with the inference router when using the following setup:

  • Model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B

  • SDK: huggingface_hub.InferenceClient

  • Code:

from huggingface_hub import InferenceClient

client = InferenceClient(
    api_key="…",
    provider="auto",
 
)

payload = {
    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
    "messages": [{"role": "user", "content": "write a poem about the sea"}],
    "max_tokens": 3072,
}

response = client.chat_completion(**payload)
print(response["choices"][0]["message"]["content"])

Even though SambaNova is enabled for my organization and supports this model, and
even though Novita is explicitly disabled in my org settings,
provider="auto" consistently routes the request to Novita, resulting in:

403 Forbidden: Inference provider novita is not enabled for the org global.
Cannot access: https://router.huggingface.co/novita/v3/openai/chat/completions

It appears that the router is still trying to use Novita even though:

  • it is disabled in my organization,

  • SambaNova is enabled,

  • and SambaNova also supports the model.

This happens even after disabling Novita and refreshing the provider list.

It looks like the router might be using a cached or hard-coded “preferred provider” for DeepSeek models and not respecting the org-level provider configuration when provider="auto" is used.

Expected behavior:
provider="auto" should route to one of the providers enabled for my organization (e.g., SambaNova), or at least not attempt to use a disabled provider.

Actual behavior:
Router continues trying Novita despite it being disabled.

Could someone from the HF team confirm whether this is a routing bug, a caching issue, or expected behavior for DeepSeek models? And is there a way to make provider="auto" respect the org provider settings?

Thanks!

1 Like

I personally suspect it’s probably a bug, and no issue has been raised yet. For now, you can debug it more reliably using the method below.


Here is a compact debug path you can actually follow.


1. Confirm client + org context

  1. Upgrade client (to avoid old routing bugs):

    pip install -U huggingface_hub
    
  2. Make sure you’re clearly in the org context:

    from huggingface_hub import InferenceClient
    
    client = InferenceClient(
        api_key="...",
        provider="auto",
        bill_to="your-org-name",  # org slug as on HF
    )
    

2. Turn on debug logging

  1. Set environment variable:

    export HF_DEBUG=1
    

    This makes huggingface_hub print each HTTP call as a curl command (including provider path and headers such as X-HF-Bill-To).

  2. (Optional) In code:

    from huggingface_hub.utils import logging as hf_logging
    hf_logging.set_verbosity_debug()
    

3. Run three test calls with same payload

Use the same MODEL and messages, only change provider:

from huggingface_hub import InferenceClient

API_KEY = "..."
ORG = "your-org-name"
MODEL = "deepseek-ai/DeepSeek-R1-Distill-Llama-70B"
MESSAGES = [{"role": "user", "content": "write a poem about the sea"}]

for provider in ["auto", "sambanova", "novita"]:
    print(f"\n=== provider={provider!r} ===")
    client = InferenceClient(
        api_key=API_KEY,
        provider=provider,
        bill_to=ORG,
    )
    try:
        resp = client.chat_completion(
            model=MODEL,
            messages=MESSAGES,
            max_tokens=256,
        )
        print("OK:", resp["choices"][0]["message"]["content"][:80])
    except Exception as e:
        print("ERROR:", repr(e))

Interpretation:

  • If auto and novita both 403 with “novita not enabled for org” and sambanova works:

    • You have proven that auto is selecting Novita and that SambaNova is a valid alternative.

4. Inspect the logged curl command

From the HF_DEBUG output:

  1. Copy the curl for the failing provider="auto" call.

    • You should see a URL like .../novita/v3/openai/chat/completions.
  2. Run it manually with curl -v and note:

    • Status code (403).
    • Any X-Request-Id header (for HF support).

Then:

  • Take the same curl, only change novitasambanova in the URL.
  • Run again; if that works, you have a minimal “same request, different provider” repro.

5. Verify UI settings once

  • In org settings: Novita disabled, SambaNova enabled.
  • In personal Inference Provider settings: SambaNova enabled; ideally Novita disabled or at least lower priority.

6. Decide next step

  • Short-term fix: use provider="sambanova" or model="...:sambanova" in code.

  • For HF: open a GitHub issue and attach:

    • The minimal script,
    • The failing + working curl commands,
    • The X-Request-Id from the 403,
    • A note that org has Novita disabled but auto still selects it.

That is the simplest end-to-end debug path that both proves the behavior and gives HF exactly what they need to investigate.

1 Like