I’m experiencing an issue with the inference router when using the following setup:
from huggingface_hub import InferenceClient
client = InferenceClient(
api_key="…",
provider="auto",
)
payload = {
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"messages": [{"role": "user", "content": "write a poem about the sea"}],
"max_tokens": 3072,
}
response = client.chat_completion(**payload)
print(response["choices"][0]["message"]["content"])
Even though SambaNova is enabled for my organization and supports this model, and
even though Novita is explicitly disabled in my org settings,
provider="auto" consistently routes the request to Novita, resulting in:
403 Forbidden: Inference provider novita is not enabled for the org global.
Cannot access: https://router.huggingface.co/novita/v3/openai/chat/completions
It appears that the router is still trying to use Novita even though:
This happens even after disabling Novita and refreshing the provider list.
It looks like the router might be using a cached or hard-coded “preferred provider” for DeepSeek models and not respecting the org-level provider configuration when provider="auto" is used.
Expected behavior:
provider="auto" should route to one of the providers enabled for my organization (e.g., SambaNova), or at least not attempt to use a disabled provider.
Actual behavior:
Router continues trying Novita despite it being disabled.
Could someone from the HF team confirm whether this is a routing bug, a caching issue, or expected behavior for DeepSeek models? And is there a way to make provider="auto" respect the org provider settings?
Thanks!
1 Like
I personally suspect it’s probably a bug, and no issue has been raised yet. For now, you can debug it more reliably using the method below.
Here is a compact debug path you can actually follow.
1. Confirm client + org context
-
Upgrade client (to avoid old routing bugs):
pip install -U huggingface_hub
-
Make sure you’re clearly in the org context:
from huggingface_hub import InferenceClient
client = InferenceClient(
api_key="...",
provider="auto",
bill_to="your-org-name", # org slug as on HF
)
2. Turn on debug logging
-
Set environment variable:
export HF_DEBUG=1
This makes huggingface_hub print each HTTP call as a curl command (including provider path and headers such as X-HF-Bill-To).
-
(Optional) In code:
from huggingface_hub.utils import logging as hf_logging
hf_logging.set_verbosity_debug()
3. Run three test calls with same payload
Use the same MODEL and messages, only change provider:
from huggingface_hub import InferenceClient
API_KEY = "..."
ORG = "your-org-name"
MODEL = "deepseek-ai/DeepSeek-R1-Distill-Llama-70B"
MESSAGES = [{"role": "user", "content": "write a poem about the sea"}]
for provider in ["auto", "sambanova", "novita"]:
print(f"\n=== provider={provider!r} ===")
client = InferenceClient(
api_key=API_KEY,
provider=provider,
bill_to=ORG,
)
try:
resp = client.chat_completion(
model=MODEL,
messages=MESSAGES,
max_tokens=256,
)
print("OK:", resp["choices"][0]["message"]["content"][:80])
except Exception as e:
print("ERROR:", repr(e))
Interpretation:
4. Inspect the logged curl command
From the HF_DEBUG output:
-
Copy the curl for the failing provider="auto" call.
- You should see a URL like
.../novita/v3/openai/chat/completions.
-
Run it manually with curl -v and note:
- Status code (403).
- Any
X-Request-Id header (for HF support).
Then:
- Take the same
curl, only change novita → sambanova in the URL.
- Run again; if that works, you have a minimal “same request, different provider” repro.
5. Verify UI settings once
- In org settings: Novita disabled, SambaNova enabled.
- In personal Inference Provider settings: SambaNova enabled; ideally Novita disabled or at least lower priority.
6. Decide next step
That is the simplest end-to-end debug path that both proves the behavior and gives HF exactly what they need to investigate.
1 Like