Inference API stopped working

John6666 · May 7, 2025, 7:52am

From Hugging Face Discord:

Tom Aarsen

Hello! I believe some of the inference endpoints currently have “Scale to zero” enabled temporarily, meaning they will go down when there’s no usage for a while. The first request will then be slow/fail, but subsequent ones will work. We’re going to remove the scale to zero again so that this is not an issue anymore, apologies for the inconvenience. cc @ VB can you update the scale to zero for the big ST models that already had APIs?

TOOTHED · May 7, 2025, 4:43pm

As for my case of my private model, “Scale to zero” was always the case - yes, I needed to wait for 1-3 mins for the first response. But right now it is 404. The issue described in discord is different.

Eddy872 · May 7, 2025, 6:29pm

Hi everyone!
Do you have any updates on the fixes for the bugs discussed in this thread?

John6666 · May 9, 2025, 1:32pm

404 with 429 issue is fixed!

BehNas · May 22, 2025, 4:29pm

I am still receiving the 404 Client Error which does not found for url
is there any way to fix it?
Can not even access “https://api-inference.huggingface.co” with web browser.

John6666 · May 23, 2025, 12:11am

Same here.

RichLivMyB4 · May 23, 2025, 4:58pm

I am still getting the same issue still. Anyone else seeing that?

meganariley · May 23, 2025, 5:26pm

Hi @BehNas and @RichLivMyB4 What model are you trying to use with the HF Inference API? You can find all models supported by the HF Inference API provider here: Models - Hugging Face.

ParahumanSkitter · May 27, 2025, 1:36pm

Hello there, @meganariley ma’am. A lot of the issues I and many others are experiencing are with text-to-image models. The base SDXL models, as well as SD 3.5 and FLUX are still working fine, but any fine tunes such as those based on Illustrious or Pony for SDXL no longer work with the HF inference API. The models I uploaded for use in my CPU based Gradio Spaces were working fine early April, but then this issue emerged, at first giving an error along the lines of “model inference is not supported HF inference”, and now my spaces state a “404: API not found” error. The most I can state about this error is that most if not all fine tuned text-to-image models used to work prior to the rework of the HF inference API some time last year, and while some functionality was restored over the last few months, the HF inference API no longer works as intended for a majority of the models, and I do not possess the hardware needed for local operation, nor do I have the budget to justify dedicated GPUs for my Gradio spaces. While I’m sure I can modify my code with the help of chatbots like Qwen 3.0 to use a third party API like Replicate, I would love to see the original functionality of the HF inference API restored, assuming this isn’t actually an error and rather that the API was disabled to prevent general use and access by public spaces while you all revamp the API. I can’t really give you specific model examples for what models aren’t working, as the number of models effected easily approaches 20,000 at least. We would love a solution that doesn’t require a PRO membership to implement.

John6666 · May 27, 2025, 3:00pm

Yeah. I’m just an outsider, so I can only speculate. Roughly speaking, I think the situation is that “shared GPU resources were overused to the point where Hugging Face couldn’t handle it.”
The persistent cyberattacks since mid-last year likely exacerbated the situation.

That’s why I don’t intend to complain about outsourcing or reducing usage limits. I don’t have a hobby of tormenting “The Happy Prince” by Oscar Wilde…

Currently, I’m experimenting with alternatives using Zero GPU, but we’re struggling to fully replicate the traditional look and feel.
Plus, for us living in developed countries (my country is still considered developed, though the prospect of becoming a developing nation is on the horizon—but not yet…), subscribing to a Pro plan isn’t too difficult. But the idea of Hugging Face becoming a gated community for developed country residents feels really lonely emotionally.

It doesn’t have to be immediate. Eventually, if there’s any intermediate solution or relief measure, even a small one, it would give me something to look forward to each day.

RichLivMyB4 · June 8, 2025, 5:42pm

Hi Megan. Sorry for long delay

mistralai/mixtral-8x7b-instruct-v01 is the exact model ref we pass, which used to work just fine. I do notice after checking HF that it shows as v0.1 not v01. Is the point significant and has something changed here? It’s not been updated since Aug 2024 and it worked fine up to a few weeks since.

Topic		Replies	Views
HF Inference API last few minutes returns the same 404 exception to all models Inference Endpoints on the Hub	45	2345	June 25, 2025
Inference API time out? Site Feedback	2	943	February 28, 2024
Inference API stopped working for my model 🤗Hub	11	5412	April 26, 2023
Stable Diffusion hub demo AND inference API not working Inference Endpoints on the Hub	0	480	March 6, 2023
But is there even a single model working here?! Models	4	428	May 10, 2025

Inference API stopped working

Related topics