Hello! I believe some of the inference endpoints currently have “Scale to zero” enabled temporarily, meaning they will go down when there’s no usage for a while. The first request will then be slow/fail, but subsequent ones will work. We’re going to remove the scale to zero again so that this is not an issue anymore, apologies for the inconvenience. cc @ VB can you update the scale to zero for the big ST models that already had APIs?
As for my case of my private model, “Scale to zero” was always the case - yes, I needed to wait for 1-3 mins for the first response. But right now it is 404. The issue described in discord is different.
I am still receiving the 404 Client Error which does not found for url
is there any way to fix it?
Can not even access “https://api-inference.huggingface.co” with web browser.
Hi @BehNas and @RichLivMyB4 What model are you trying to use with the HF Inference API? You can find all models supported by the HF Inference API provider here: Models - Hugging Face.
Hello there, @meganariley ma’am. A lot of the issues I and many others are experiencing are with text-to-image models. The base SDXL models, as well as SD 3.5 and FLUX are still working fine, but any fine tunes such as those based on Illustrious or Pony for SDXL no longer work with the HF inference API. The models I uploaded for use in my CPU based Gradio Spaces were working fine early April, but then this issue emerged, at first giving an error along the lines of “model inference is not supported HF inference”, and now my spaces state a “404: API not found” error. The most I can state about this error is that most if not all fine tuned text-to-image models used to work prior to the rework of the HF inference API some time last year, and while some functionality was restored over the last few months, the HF inference API no longer works as intended for a majority of the models, and I do not possess the hardware needed for local operation, nor do I have the budget to justify dedicated GPUs for my Gradio spaces. While I’m sure I can modify my code with the help of chatbots like Qwen 3.0 to use a third party API like Replicate, I would love to see the original functionality of the HF inference API restored, assuming this isn’t actually an error and rather that the API was disabled to prevent general use and access by public spaces while you all revamp the API. I can’t really give you specific model examples for what models aren’t working, as the number of models effected easily approaches 20,000 at least. We would love a solution that doesn’t require a PRO membership to implement.
Yeah. I’m just an outsider, so I can only speculate. Roughly speaking, I think the situation is that “shared GPU resources were overused to the point where Hugging Face couldn’t handle it.”
The persistent cyberattacks since mid-last year likely exacerbated the situation.
That’s why I don’t intend to complain about outsourcing or reducing usage limits. I don’t have a hobby of tormenting “The Happy Prince” by Oscar Wilde…
Currently, I’m experimenting with alternatives using Zero GPU, but we’re struggling to fully replicate the traditional look and feel.
Plus, for us living in developed countries (my country is still considered developed, though the prospect of becoming a developing nation is on the horizon—but not yet…), subscribing to a Pro plan isn’t too difficult. But the idea of Hugging Face becoming a gated community for developed country residents feels really lonely emotionally.
It doesn’t have to be immediate. Eventually, if there’s any intermediate solution or relief measure, even a small one, it would give me something to look forward to each day.
mistralai/mixtral-8x7b-instruct-v01 is the exact model ref we pass, which used to work just fine. I do notice after checking HF that it shows as v0.1 not v01. Is the point significant and has something changed here? It’s not been updated since Aug 2024 and it worked fine up to a few weeks since.