HF Agents Course 404 Client Error: Not Found for url

TinaVesper · July 12, 2025, 11:58am

Hey guys

I’m struggling with this error:

404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions

The code is taken from here:

It’s appearing with any instruct model i tried (including those with special access such as Llama models)

What’s that?

Would be grateful for any help

I saw there is maybe a problem with zero-scale or something like that, but i used popular models, I’m not sure that this is a reason

John6666 · July 12, 2025, 12:40pm

I think this is due to a large number of models whose deployment has been canceled, as well as major changes to the library used for the Inference API. I’m not familiar with the workaround for this issue on LlamaIndex, but according to GitHub, updating the HF library should still make it work.

To update hf_hub library

pip install -U huggingface_hub

TinaVesper · July 12, 2025, 12:57pm

Hi, thanks for your answer!
Unfortunately updating didn’t help, I’ve tried it

John6666 · July 12, 2025, 1:04pm

Hmm, in that case, do you need to update LlamaIndex, or has it become unusable due to further specification changes…?
I think the model itself is deployed via Inference Provider.

However, if you are not particularly attached to that model, it might be better to look for an alternative. More detailed information is available in the Agents course channel on Hugging Face Discord.

Alternative API Endpoints / local models for smolagents

TinaVesper · July 12, 2025, 2:57pm

Everything is up-to-date

Actually I’m using some other models directly, but just want to cope with that problem. Maybe someone knows how to fix it

Thank you anyway

John6666 · July 12, 2025, 11:17pm

https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions

hf-inference <= this

I see. Let me explain the situation. It is normal for this URL not to work because this model has not been deployed with HF Inference. Currently, very few LLMs are deployed via HF Inference. Most are deployed via other Inference Providers.

If LlamaIndex does not have a feature to switch the Inference Provider or set it to ="auto", only few models will work.

TinaVesper · July 13, 2025, 5:00am

Yes, I think you’re right and the problem is in the framework or so. Just don’t understand why they put this example in the course.
Actually it must be available for deploy with HF Inference, because there is a code for deploying:

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="auto",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

But maybe this is the only way to deploy it, and HuggingFaceInferenceAPI is restricted now (despite this code is in the course).

John6666 · July 13, 2025, 5:06am

Just don’t understand why they put this example in the course.

Yeah. When the course was created, that method was available…
If it’s just a matter of library versions or so, we can just stick with the old ones, but for the “Agents” course, we need as many examples as possible of using “external APIs,” whether provided by HF or a third party…

But AI services change a lot in just a few months. It’s difficult to keep them up to date.

TinaVesper · July 13, 2025, 5:36am

Agree. But it can be easily resolved at least with linked discussions about problems&solutions on this forum for instance. Just one button on the page “Got stuck, but found a solution? Tell us more” or so. I saw the same on the other platform. Or just a little checklist, like..there are may appear some problems. Check you have Pro status to use HF Inference API, check deploy button etc etc

Without claims to authors, always there are ways to make a course better

Thanks for you help!

dzungever · July 24, 2025, 5:36am

I can get HuggingFaceInferenceAPI to work by adding the provider as below.

llm = HuggingFaceInferenceAPI(
model_name=“Qwen/Qwen2.5-Coder-32B-Instruct”,
temperature=0.7,
max_tokens=100,
token=hf_token,
provider=“together”,
)

John6666 · July 24, 2025, 5:45am

Hmm, that’s strange… I think it’s been deployed…
Have you tried updating LangChain and huggingface_hub?

Edit:
Oh. I misunderstood. Great!
Maybe provider="auto", also work.

TinaVesper · July 24, 2025, 6:18am

Yes, it works this way, thanks a lot!

system · July 24, 2025, 6:18pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
HF Inference API last few minutes returns the same 404 exception to all models Inference Endpoints on the Hub	45	2285	June 25, 2025
404 Existing Hugging Face Inference Model Not Found 🤗Hub	2	149	July 31, 2025
Prohibition on loading models (Probable) 🤗Transformers	0	493	March 25, 2023
Agents Course Unit 2.2 error 404 Beginners	3	39	September 3, 2025
Dumb Question: Seeing that my inference API links not working Beginners	1	103	July 10, 2025

HF Agents Course 404 Client Error: Not Found for url

To update hf_hub library

Alternative API Endpoints / local models for smolagents

Related topics