03 Error When Using Qwen/Qwen2.5-VL-32B-Instruct with Inference Provider

It’s most likely not your problem. I think we just have to fix the space side…


1. What you are doing and what is failing

You are:

  • Taking the Hugging Face AI Agents Course – Unit 2.1 smolagents Code Quiz Space
    (agents-course/unit2_smolagents_quiz).(Hugging Face)

  • In your notebook code, you configure a smolagents model:

    from smolagents import InferenceClientModel
    
    model = InferenceClientModel(
        mprovider="auto",
        api_key="KEY",
    )
    
  • When you submit code in the quiz UI, you get a warning:

    Error generating feedback: 410 Client Error: Gone for
    Qwen2.5-Coder-32B-Instruct is no longer supported. Please use router.huggingface.co instead.

Important:
This error is not coming from your InferenceClientModel in the notebook.
It comes from the quiz Space’s own grader.


2. How the quiz Space is wired internally

If we inspect the Space’s app.py, we see:(Hugging Face)

from huggingface_hub import InferenceClient

HF_TOKEN = os.getenv("HF_TOKEN")
HF_API_URL = os.getenv("HF_API_URL", "Qwen/Qwen2.5-Coder-32B-Instruct")

# This client is used by the grader
client = InferenceClient(model=HF_API_URL, token=HF_TOKEN)

Then, for grading your answer:

response = client.text_generation(
    prompt=prompt,
    grammar={
        "type": "json_object",
        "value": CodeFeedback.model_json_schema(),
    },
)

and if anything goes wrong:

except Exception as e:
    gr.Warning(f"Error generating feedback: {str(e)}")

So the pipeline is:

  1. You write code and click “Next”.
  2. The Space calls its own InferenceClient with model
    "Qwen/Qwen2.5-Coder-32B-Instruct".
  3. That call fails with an HTTP 410.
  4. The error message is wrapped into gr.Warning(...), which is what you see.

Your InferenceClientModel in the exercise is separate. It is only used when your own smolagent runs. The grader is a different client object.


3. Background: what “410 Gone” means here

3.1 HTTP level

  • 410 Gone = “this resource used to exist but is permanently removed.”
  • It’s stronger than 404 (Not Found) and typically used during deprecations.

3.2 Hugging Face context (old vs new inference)

Historically:

  • People called:
    https://api-inference.huggingface.co/models/<model-id>
    (the old “Inference API / serverless” endpoint).
  • Hugging Face has been retiring that endpoint in favor of the router-based Inference Providers API at:
    https://router.huggingface.co/... (Hugging Face Forums)

Recently:

  • Requests to https://api-inference.huggingface.co/... now often return:

    {
      "error": "https://api-inference.huggingface.co is no longer supported.
                Please use https://router.huggingface.co/hf-inference instead."
    }
    
  • This is returned with 410 to clearly signal “this path is gone”.

So whenever you see this type of message, it means:

“You (or some library under the hood) are still calling the old Inference API endpoint.
You must switch to the new router-based Inference Providers API.”


4. What about Qwen2.5-Coder-32B-Instruct itself?

The model card for Qwen/Qwen2.5-Coder-32B-Instruct is alive and well and explicitly says that this 32B code model is supported and SOTA for coding tasks.(Hugging Face)

So:

  • The model is not globally “dead”.
  • What is “no longer supported” is the old way of calling it via the deprecated API.

In other words, the message:

“Qwen2.5-Coder-32B-Instruct is no longer supported. Please use router.huggingface.co instead.”

should be read as:

“This model is no longer supported via this particular legacy API path.
Use the router / Inference Providers setup instead.”


5. Why changing your InferenceClientModel did not fix it

Your code:

model = InferenceClientModel(
    mprovider="auto",
    api_key="KEY",
)

There are two separate issues here.

5.1 A small bug in your code (but not the cause of the 410)

In smolagents, the parameter name is provider, not mprovider.
The docs show usage like:(Hugging Face)

InferenceClientModel(
    model_id="...",  # optional
    provider="auto",
    api_key="hf_...",
)

So your code should be:

model = InferenceClientModel(
    provider="auto",
    api_key="KEY",
)

This is worth fixing, but even if you fix it, it will not solve the 410 you see in the quiz.

5.2 Two independent model calls

There are two different “model calls” in play:

  1. Your smolagent model (inside your notebook)
    – configured with InferenceClientModel(provider="auto", ...).
    This is for agent reasoning and tool calls.

  2. The quiz grader model (inside app.py in the Space)
    InferenceClient(model="Qwen/Qwen2.5-Coder-32B-Instruct", token=HF_TOKEN)
    and then client.text_generation(...).(Hugging Face)

Your modification only affects (1).
The error you are seeing comes from (2).

That’s why “configuring the default InferenceClientModel” in your solution does not change the warning about Qwen2.5-Coder-32B-Instruct.


6. Why the Space’s grader is hitting a dead path

We know:

  • The Space uses huggingface_hub.InferenceClient.(Hugging Face)
  • Modern InferenceClient is designed to talk to the new router-based Inference Providers API.(Hugging Face)
  • The error string about “no longer supported, use router.huggingface.co” is characteristic of calls still going through the old api-inference.huggingface.co domain.(Hugging Face Forums)

Most likely scenario (based on current HF migration docs and recent issues):

  1. The Space (or the underlying Hugging Face Hub client version its Docker image is actually running) is still using or falling back to the legacy Inference API path for "Qwen/Qwen2.5-Coder-32B-Instruct" in combination with text_generation.
  2. That legacy path has been permanently shut down for new accounts / tokens and returns 410 with guidance to switch to router.
  3. Therefore, the Space’s grading backend is out-of-date with Hugging Face’s current inference stack.

This matches what we see in other public reports: as HF retires the legacy Inference API, many apps built on that API start returning 404/410 with messages pointing to router.huggingface.co.(Hugging Face Forums)

So the root cause is:

The quiz Space is still calling this Qwen model through an API path that Hugging Face has deprecated and removed.


7. What you can actually do (practically)

7.1 If you just want to finish the official quiz

You cannot, from the UI, change:

  • The model HF_API_URL value (Qwen/Qwen2.5-Coder-32B-Instruct).
  • The way InferenceClient is configured inside app.py.

Therefore:

  • As long as the official Space is not updated, the feedback step will continue to fail with a 410.
  • This does not mean your solution is wrong; it means the grading infrastructure is broken.

Concrete options:

  • Use the quiz mainly to write and run code locally, and treat the feedback as “best effort” that may fail.

  • Optionally open an issue:

    • Either in the Space “Community” tab, or
    • On the forum (similar to other course-bug posts).(Hugging Face)

In other words: there is nothing you can fix from within the exercise code itself to stop that warning in the official Space.

7.2 If you duplicate the Space and fix it yourself

If you want a working version under your own account:

  1. Go to the Space page (agents-course/unit2_smolagents_quiz).(Hugging Face)
  2. Click “Duplicate this Space”.
  3. In your copy, edit app.py and update the grader.

Two main fix patterns.

Fix pattern A: switch to another provider-backed model

For example, replace:

HF_API_URL = os.getenv("HF_API_URL", "Qwen/Qwen2.5-Coder-32B-Instruct")
client = InferenceClient(model=HF_API_URL, token=HF_TOKEN)

with something like:

from huggingface_hub import InferenceClient

HF_TOKEN = os.getenv("HF_TOKEN")

client = InferenceClient(
    provider="hf-inference",  # or a specific third-party provider
    model="meta-llama/Llama-3.1-8B-Instruct",  # or another supported chat model
    api_key=HF_TOKEN,
)

and then either:

  • keep text_generation(...) if it works with that pair, or
  • migrate to chat.completions.create(...) / chat_completion(...) API which is now the recommended way to talk to LLMs via Inference Providers.(Hugging Face)

Fix pattern B: keep Qwen2.5-Coder but specify a provider

From the Qwen2.5-Coder-32B-Instruct model card, you can see that it is available through Inference Providers (e.g. via some providers like Fireworks, Nscale, etc.).(Hugging Face)

Then in your own Space copy, do:

client = InferenceClient(
    provider="fireworks-ai",  # example; choose a provider that supports this model
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    api_key=HF_TOKEN,         # HF token with Inference Providers permissions
)

This makes sure:

  • The request goes through the router,
  • It uses a concrete provider that you have access to,
  • It no longer hits the deprecated old API.

This pattern is exactly how others successfully call Qwen2.5-VL/2.5-Coder through providers in the forum thread you linked.(Hugging Face Forums)


8. How to correctly use InferenceClientModel in your own code

Even though this does not fix the quiz’s grader, it is still worth correcting your own code so that you are set up correctly for later units.

A more robust configuration:

from smolagents import InferenceClientModel

model = InferenceClientModel(
    # Optional: set a model_id explicitly
    # model_id="Qwen/Qwen2.5-Coder-14B-Instruct",
    provider="auto",     # or "hf-inference" or a concrete provider like "fireworks-ai"
    api_key="hf_...",    # your HF token with Inference Providers enabled
)

Key points:

  • Use the argument name provider, not mprovider.
  • provider="auto" lets Hugging Face choose among providers allowed in your account; you can explicitly pick one if you want more control.(Hugging Face)

This is aligned with the new Inference Providers + router architecture and will work consistently across your own projects, independent of the course Space.


9. Short recap (bullet summary)

  • The error you see in the quiz Space comes from the grader’s own InferenceClient that uses "Qwen/Qwen2.5-Coder-32B-Instruct" in app.py, not from your smolagents InferenceClientModel.(Hugging Face)

  • HTTP 410 + “use router.huggingface.co” means “you are calling the deprecated api-inference.huggingface.co API; you must use the router-based Inference Providers API now.”(Hugging Face Forums)

  • Qwen2.5-Coder-32B-Instruct is still available on the Hub; what is “not supported” is the old way of calling it, not the model itself.(Hugging Face)

  • Changing your InferenceClientModel inside the exercise does not change the grader’s model; they are separate clients.

  • To actually fix the error, the Space must be updated to use the router-based Inference Providers API:

    • either by switching to a different provider-backed model, or
    • by calling Qwen/Qwen2.5-Coder-32B-Instruct via InferenceClient(provider=..., model=..., api_key=...).(Hugging Face)
  • You can:

    • report this as a course Space bug, and/or
    • duplicate the Space and patch app.py yourself.
  • In your own code, use provider="auto" instead of mprovider, and give a valid HF token with Inference Providers permission.