I have a Pro subscription, and I am trying to access the Wan-AI/Wan2.2-TI2V-5B model (or tencent/HunyuanVideo) via an asynchronous POST request to https://router.huggingface.co/hf-inference using my Bearer token. I am consistently receiving a 404 Not Found error, which suggests a routing or mapping issue on the server’s end."
Usage of Inference Providers have some quirks.
The 404 is expected. You are calling the wrong endpoint and the wrong provider for those models. The router is behaving as designed; it is not a mapping bug on Hugging Face’s side.
I will break this into:
- What your current call is doing.
- How Hugging Face’s router + providers actually work.
- Why
Wan-AI/Wan2.2-TI2V-5Bandtencent/HunyuanVideo404 onhf-inference. - How to call these models correctly (sync + async).
- How to keep your “asynchronous POST” architecture without fighting the API.
1. What you are doing now
You described:
-
You have a Pro subscription.
-
You send an asynchronous POST (from your own code) to:
POST https://router.huggingface.co/hf-inference Authorization: Bearer hf_... Content-Type: application/json -
You get
404 Not Foundconsistently.
Two key problems at once:
-
https://router.huggingface.co/hf-inferenceby itself is not a valid inference route; it is just a prefix.-
For HF Inference, official examples use a pattern like:
curl https://router.huggingface.co/hf-inference/models/facebook/bart-large-mnli \ -H "Authorization: Bearer $HF_TOKEN" \ -d '{"inputs": "...", "parameters": {...}}'
-
-
The models you want,
Wan-AI/Wan2.2-TI2V-5Bandtencent/HunyuanVideo, are not hosted by thehf-inferenceprovider. They’re hosted by third-party Inference Providers such as Fal, WaveSpeedAI, Novita, etc. (Hugging Face)
So even if you “fixed” the path to:
POST https://router.huggingface.co/hf-inference/models/Wan-AI/Wan2.2-TI2V-5B
you would still get a 404, because there is no HF-Inference deployment for that model.
2. Background: router + providers (why /hf-inference often 404s)
2.1 Three layers in the current Hugging Face stack
Modern Hugging Face inference is structured like this:
-
Router (
https://router.huggingface.co)- A “switchboard” that forwards requests to different backends called Inference Providers.
- It is used by the Python
InferenceClient, the JS client, and OpenAI-style/v1/...APIs. (Hugging Face)
-
Inference Providers (Fal, WaveSpeedAI, Novita, Together, HF Inference, etc.)
- Each provider supports particular tasks (chat, text-to-image, text-to-video…).
- The providers table shows, for example, that Fal, Novita and WaveSpeed support Text to Video. HF Inference also supports text-to-video as a task, but not for every model. (Hugging Face)
-
HF Inference
- This is just one provider in that table (“HF Inference”).
- It replaces the old
api-inference.huggingface.co, but it only hosts a limited set of “warm” models. Many hub models are only available via third-party providers, not via HF Inference. (Hugging Face Forums)
2.2 What /hf-inference actually means
-
https://router.huggingface.co/hf-inference/...= “send this request to the HF Inference provider.” -
That path is only valid in specific shapes, for example:
/hf-inference/models/<model-id> /hf-inference/models/<model-id>/v1/chat/completions -
Calling the bare root
/hf-inferencewill always 404; it is not a complete resource path.
So conceptually:
Router = front door.
/hf-inference= “use the HF Inference backend, not Fal/Novita/etc.”
404 here usually means “this path/model does not exist on HF Inference.”
3. Where Wan2.2 and HunyuanVideo actually live
3.1 Evidence from provider and model listings
From the Fal and WaveSpeed provider/model lists: (Hugging Face)
Wan-AI/Wan2.2-TI2V-5Bis explicitly listed as a Text-to-Video model available via Fal and WaveSpeedAI (and other providers).tencent/HunyuanVideois also listed as Text-to-Video on Fal and appears in the text-to-video model index with multiple providers. (Hugging Face)
From the WaveSpeed provider doc: (Hugging Face)
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="wavespeed",
api_key=os.environ["HF_TOKEN"],
)
video = client.text_to_video(
"A young man walking on the street",
model="Wan-AI/Wan2.2-TI2V-5B",
)
From the Fal/HunyuanVideo ecosystem: (fal.ai)
- Fal advertises
fal-ai/hunyuan-videoas a text-to-video endpoint. - The Hugging Face model listings show
tencent/HunyuanVideoas a text-to-video model available via Fal and other providers.
The important point: all the official code examples for these models use provider="fal-ai" or provider="wavespeed" (or similar), never provider="hf-inference".
So when you target https://router.huggingface.co/hf-inference, you are explicitly asking to use the wrong provider for these models.
4. Why your specific request returns 404
Putting it together:
-
Invalid path shape
POST https://router.huggingface.co/hf-inference→ no/models/<id>and no task suffix.- That path is not defined in the HF router; 404 is the correct HTTP response.
-
Wrong provider for these models
-
Wan-AI/Wan2.2-TI2V-5Bandtencent/HunyuanVideoare deployed on Fal/WaveSpeed/Novita, not on HF Inference. (Hugging Face) -
Even if you send:
POST https://router.huggingface.co/hf-inference/models/Wan-AI/Wan2.2-TI2V-5BHF Inference has no such deployment → 404.
-
-
This is a documented migration pattern
- Hugging Face forum threads about the old
api-inferenceendpoint show people hittingrouter.huggingface.co/hf-inferenceand seeing 404 until they switch to proper Inference Providers usage (InferenceClientwithprovider="..."). (Hugging Face Forums)
- Hugging Face forum threads about the old
So your 404 is basically:
“The HF Inference provider does not know about this path/model. Please call a supported provider or use the documented client.”
It is not a bug or a misrouting on HF’s side; it’s a mismatch between your chosen endpoint/provider and where those models actually run.
5. How to call these models correctly
5.1 Recommended: Python InferenceClient.text_to_video(...)
The simplest and most stable approach is to stop hand-crafting HTTP calls and use the official client, which already knows how to talk to the router and providers. (Hugging Face)
Example for Wan2.2 (WaveSpeed)
import os
from huggingface_hub import InferenceClient
HF_TOKEN = os.environ["HF_TOKEN"]
client = InferenceClient(
provider="wavespeed", # or "fal-ai", "novita", etc. — all listed in the docs
api_key=HF_TOKEN,
)
video_bytes = client.text_to_video(
"A cinematic shot of a city at sunset, drone view",
model="Wan-AI/Wan2.2-TI2V-5B",
)
with open("wan22.mp4", "wb") as f:
f.write(video_bytes)
This is almost exactly the snippet in the WaveSpeed provider docs, just with a different prompt. (Hugging Face)
Example for HunyuanVideo (Fal)
import os
from huggingface_hub import InferenceClient
HF_TOKEN = os.environ["HF_TOKEN"]
client = InferenceClient(
provider="fal-ai", # HunyuanVideo is available via Fal and other providers
api_key=HF_TOKEN,
)
video_bytes = client.text_to_video(
"A dragon flying above mountains in stormy weather",
model="tencent/HunyuanVideo",
)
with open("hunyuan.mp4", "wb") as f:
f.write(video_bytes)
HunyuanVideo is listed as text-to-video on Fal and in the HF model index; this pattern follows the official Text-to-Video and provider docs. (fal.ai)
Using automatic provider selection
On current huggingface_hub versions, you can also let HF pick the provider:
client = InferenceClient(
provider="auto", # or just omit provider; HF chooses based on model + task
api_key=HF_TOKEN,
)
video_bytes = client.text_to_video(
"A close-up of waves crashing on rocks, slow motion",
model="Wan-AI/Wan2.2-TI2V-5B",
)
Internally, this still goes through https://router.huggingface.co, but you no longer worry about /hf-inference vs Fal vs WaveSpeed; the client handles routing.
5.2 Keeping your code “asynchronous” safely
The Hugging Face Text-to-Video API itself is synchronous: one HTTP request in, video bytes back when ready. The “asynchronous” part should live in your own application layer.
A practical pattern in Python:
import asyncio
from huggingface_hub import InferenceClient
import os
HF_TOKEN = os.environ["HF_TOKEN"]
client = InferenceClient(
provider="wavespeed",
api_key=HF_TOKEN,
)
async def generate_video_async(prompt: str) -> bytes:
# Run blocking call in a worker thread so your event loop is not blocked
return await asyncio.to_thread(
client.text_to_video,
prompt,
"Wan-AI/Wan2.2-TI2V-5B",
)
# Example usage:
# video = await generate_video_async("A robot walking in a neon-lit alley")
You can also put this call behind a job queue:
- Your frontend sends an async POST to your own
/video_jobsendpoint. - Your backend enqueues a job.
- A worker process calls
client.text_to_video(...)(blocking is fine here). - The worker stores the video and marks the job as done.
- Your frontend polls or subscribes to a status endpoint.
Key idea: do not try to make https://router.huggingface.co/hf-inference itself behave like an async job queue. Treat HF’s call as a long-running synchronous operation, and build your async behavior around it.
5.3 If you absolutely must use raw HTTP
For languages with no HF client, you still should avoid guessing /hf-inference paths.
Options:
- Use Hugging Face’s router via
InferenceClientin a small sidecar service (Python/Node), and talk HTTP/GRPC to that sidecar from your main stack. - Or call provider APIs directly (Fal, WaveSpeed, Novita) using their documented HTTP endpoints and keys. For example, Fal exposes
fal-ai/hunyuan-videowith its own JSON schema and endpoint. (fal.ai)
Both are more stable than reverse-engineering router URLs, which can change as HF evolves the platform.
6. How to map your current code to the correct design
You can think of your migration in three concrete steps:
-
Stop calling
POST https://router.huggingface.co/hf-inferenceThat route is incomplete and points to the wrong provider for these models.
-
Switch to the documented Text-to-Video client
- Python/Node:
InferenceClient(..., provider="wavespeed" | "fal-ai" | "novita" | "auto")+text_to_video(..., model="Wan-AI/Wan2.2-TI2V-5B" or "tencent/HunyuanVideo"). (Hugging Face)
- Python/Node:
-
Wrap that call in your own async job pattern
- Background workers /
asyncio.to_thread/ queue. - Your client still does an “asynchronous POST”, but that POST is to your backend, not directly to
/hf-inference.
- Background workers /
7. Minimal, concrete “fix” summary
-
The 404 is expected because:
/hf-inferencewith no/models/...is not a valid endpoint. (Hugging Face)Wan-AI/Wan2.2-TI2V-5Bandtencent/HunyuanVideoare served by providers like Fal and WaveSpeed, not by HF Inference. (Hugging Face)
-
Use the official text-to-video client instead of manual POSTs:
from huggingface_hub import InferenceClient import os HF_TOKEN = os.environ["HF_TOKEN"] client = InferenceClient( provider="wavespeed", # or "fal-ai", "novita", "auto" api_key=HF_TOKEN, ) video = client.text_to_video( "A young man walking on the street", model="Wan-AI/Wan2.2-TI2V-5B", ) -
Keep your application asynchronous by wrapping this call in a background worker or
asyncio.to_thread, rather than trying to make/hf-inferenceitself into an async job endpoint.