Inference API stopped working

ParahumanSkitter · April 25, 2025, 2:52pm

Mage has actually been around for a few years now, at least 2-3, if not 4 years. I discovered it from a reddit post detailing various websites that had integrated AI image generation in some capacity. As far as features go, the following screenshots list the various plans for Mage and their features:

I currently use the Pro plan, and while it is pricy, the amount of features and benefits you have access to make it well worth it. I personally like some of the exclusive models, such as the Mage-exclusive Illustrious/NoobAI model MagnoliaMix, which helped me produce this image, which was enhanced right within Mage:

Until HF Staff give us a clear, concise answer on what is happening, I will continue to use https://www.mage.space/.

marcogallen · April 29, 2025, 3:29am

Any updates on this issue?
I’m using the JS InferenceClient with the sentence-transformers/all-mpnet-base-v2 model and still getting an error. It was working fine 24hrs ago and now it stopped working.
I also tried other models like: sentence-transformers/all-MiniLM-L6-v2 and jinaai/jina-embeddings-v3 They also fail

gtvracer · April 29, 2025, 4:30am

For those unable to use HF, go to Mistral and get a free account to get an API key. Then use this class, it will simuate the results getting back from InferenceClient when you use chat_stream().


import os
import config
from mistralai import Mistral

class TextPacket:
    def __init__(self):
        self.choices = []

class TextMessage:
    def __init__(self):
        self.role:str = None
        self.content:str = None

class TextGroup:
    def __init__(self):
        self.index = 0
        self.finish_reason:str = None
        self.delta:TextMessage = TextMessage()
        self.message:TextMessage = TextMessage()

class MistralGenerator():
    def __init__(self):
        self.api_key = config.MISTRALAI_APIKEY
        self.model = "mistral-small-latest"
        self.client = Mistral(api_key=self.api_key)

    def chat_complete(self, query, max_tokens=512, temperature=0.7, top_p=0.9):
        chat_response = self.client.chat.complete(
            model= self.model,
            messages = [
                {
                    "role": "user",
                    "content": query,
                },
            ],
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
        )
        print(chat_response.choices[0].message.content)
        return chat_response.choices[0].message.content

    def chat_stream(self, messages, max_tokens=512, temperature=0.7, top_p=0.9):
        stream_response = self.client.chat.stream(
            model=self.model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            stream=True
        )

        for chunk in stream_response:
            #print(chunk.data.choices[0].delta.content)
            message = TextPacket()
            group = TextGroup()
            group.index = 0
            group.delta.role = "assistant"
            group.delta.content = chunk.data.choices[0].delta.content
            message.choices.append(group)
            yield message

        # Final stop message for stream
        message = TextPacket()
        group = TextGroup()
        group.index = 0
        group.delta.role = "assistant"
        group.delta.content = ""
        group.delta.finish_reason = "stop"
        message.choices.append(group)
        yield message

John6666 · April 29, 2025, 7:02am

No progress, but a new issue arose yesterday or today…

John6666 · April 29, 2025, 12:32pm

The above error seems to have been fixed directly, but the server seems to be malfunctioning today.

gtvracer · April 29, 2025, 2:41pm

The question now is why did it happen? And how can it be prevented.

Uthar · April 29, 2025, 5:06pm

Current status: none of the text-2-image models work anymore - no matter if I clone a model to my account or if I use any of the big players (yntec / digiplay)

I’m guessing that’s a side effect from the 404 error; I played around a bit with PHP and run into the 404 brickwall everytime I tried to use anything beside “stabilityai/stable-diffusion-xl-base-1.0” - that one even works on the testground!

I slapped a little testground together here:

The sad thing: I don’t see any error. The build runs fine, no errors - but if I hit “generate”: nothing (not even an error!!). This was working on 25-04-06 - some time after that, HF had the glorious idea to “improve” something that broke all image generation.

HF staff: if this only works on a payed account: fine! Tell me and I’m game. But more than 3 weeks without any real feedback is just… bad.

I added a little debug to the build process: these are the (py)-modules the space loads:

certifi==2025.4.26
fsspec==2025.3.0
pytz==2025.2
tzdata==2025.2
setuptools==65.5.1
cryptography==44.0.2
attrs==25.3.0
pip==25.1
packaging==25.0
aiofiles==24.1.0
pyarrow==20.0.0
websockets==15.0.1
rich==14.0.0
pillow==11.2.1
click==8.0.4
multidict==6.4.3
PyYAML==6.0.2
gradio==5.27.1
psutil==5.9.8
async-timeout==5.0.1
tqdm==4.67.1
typing-extensions==4.13.2
anyio==4.9.0
protobuf==3.20.3
filelock==3.18.0
aiohttp==3.11.18
orjson==3.10.16
idna==3.10
datasets==3.5.1
xxhash==3.5.0
charset-normalizer==3.4.1
jinja2==3.1.6
MarkupSafe==3.0.2
markdown-it-py==3.0.0
pydantic-core==2.33.1
requests==2.32.3
pycparser==2.22
pygments==2.19.1
pydantic==2.11.3
semantic-version==2.10.0
python-dateutil==2.9.0.post0
aiohappyeyeballs==2.6.1
urllib3==2.4.0
numpy==2.2.5
pandas==2.2.3
itsdangerous==2.2.0
yarl==1.20.0
cffi==1.17.1
six==1.17.0
gradio-client==1.9.1
frozenlist==1.6.0
shellingham==1.5.4
authlib==1.5.2
aiosignal==1.3.2
sniffio==1.3.1
exceptiongroup==1.2.2
httpcore==1.0.9
hf-xet==1.0.5
fastapi==0.115.12
multiprocess==0.70.16
starlette==0.46.2
wheel==0.45.1
spaces==0.35.0
uvicorn==0.34.2
huggingface-hub==0.30.2
httpx==0.28.1
pydub==0.25.1
h11==0.16.0
typer==0.15.3
tomlkit==0.13.2
ruff==0.11.7
annotated-types==0.7.0
ffmpy==0.5.0
typing-inspection==0.4.0
dill==0.3.8
propcache==0.3.1
hf-transfer==0.1.9
safehttpx==0.1.6
groovy==0.1.2
mdurl==0.1.2
python-multipart==0.0.20

Yes - if I get this working, I drop all these into requirements and hopefully never have to hunt for errors I didn’t cause ever again.

Very frustrating.

John6666 · April 30, 2025, 12:32am

why did it happen?

The cause of this large-scale outage may be hardware replacement. I think it happened when the A100 in the Zero GPU space was replaced with H200. Probably other services as well?

In the long term, I think Hugging Face was unable to handle the excessive number of Inference API requests as a company. julien-c mentioned something to that effect somewhere on Hub.

Both are just speculation.

John6666 · April 30, 2025, 12:56am

The implementation of InferenceClient itself has changed significantly…
The implementation of Gradio’s external.py has also changed to use InferenceClient in the new version.

That being said, after recovering from a large-scale failure, even SmolLM2 is not deployed at this point.

hiro143 · May 5, 2025, 11:36am

“https://api-inference.huggingface.co/google/flan-t5-base”

500 (INTERNAL SERVER ERROR)
interview.component.ts:174 Error mendapatkan feedback: Hugging Face API request failed with status code: 401

error	@	interview.component.ts:174
Zone - XMLHttpRequest.addEventListener:load
sendAnswerAndGetFeedback	@	interview.component.ts:148
(anonymous)	@	interview.component.ts:238
Zone - setTimeout
toggleRecording	@	interview.component.ts:224
InterviewComponent_Template_button_click_12_listener	@	interview.component.html:49
Zone - HTMLButtonElement.addEventListener:click
InterviewComponent_Template	@	interview.component.html:49
Zone - Promise.then
(anonymous)

why when i try to use api for the models google-flan got this error yep?

John6666 · May 5, 2025, 12:06pm

Not deployed to HF API, it seems…

hiro143 · May 5, 2025, 12:11pm

but yesterday i use google flan-t5-large it can, and now cant…

John6666 · May 5, 2025, 12:13pm

I don’t understand what’s going on…

Eddy872 · May 5, 2025, 5:17pm

Hi, I think I have the same problem.
I’ve published a model Eddy872/zoove-t5, and everything seems correctly configured:

The repository is public
I’ve added the proper metadata at the top of the README.md
The model works perfectly with the pipeline() method in Python
However, the Inference API still returns a 404 error when calling
I also tried using the InferenceApi() client in Python with raw_response=True, and still got a 404.

Is there anything else I need to do to trigger the activation of the hosted Inference API for my model?

Thanks in advance for your help!

John6666 · May 5, 2025, 9:35pm

and still got a 404

I think almost all individual users are in that state right now…
Or rather, even the models of well-known companies are showing a 404 error.

TOOTHED · May 5, 2025, 10:26pm

Same problem appears today for me with my personal text generation model.

Originally, I used old (2.8.1) version of the @huggingface/inference JS package - “Error fetching from Hugging Face API: An error occurred while fetching the blob” while trying to send text input.
Updated the package to the last version (3.12.1) - the error have changed - “No Inference Provider available for model …”
Turn back to the old version, tried to specified the blob format (as this solution earlier worked with whisper audio during this API hell of the month) - “Unexpected token ‘N’, “Not Found” is not valid JSON”
Tried to send raw HTTP request - Error 404: Model not found.

Yeah, something is going very wrong since 15 of April. Probably, the only solutions are just wait or switch to a different platform (which is not the case for my friends-only model experiments, so I just believe)

anon57060024 · May 5, 2025, 11:59pm

same for me when using whisper large v3 model using provider=“hf-inference”
it wont work it

buDujS · May 6, 2025, 12:37am

Anyone has a solution for this? I am presenting my final year project in 2 days and have a model hosted on HF and all of a sudden I can not use it anymore. I would greatly appreciate any help thank you!

John6666 · May 6, 2025, 7:18am

Hmm… Is this a different problem from the previous one? @michellehbn

TOOTHED · May 7, 2025, 1:24am

Whisper is working fine for me, but I use large-v3-turbo without any provider specified. However there were a lot of issues with it last month: Inference API error with Whisper, return_timestamps parameter

Topic		Replies	Views
Unable to access model with Inference Client Beginners	2	148	May 6, 2025
404 error for models Models	6	1189	May 29, 2025
404 - "{\"error\":\"Model XLabs-AI/flux-RealismLora does not exist\"}" Models	9	363	April 16, 2025
But is there even a single model working here?! Models	4	374	May 10, 2025
Inference Provider Beginners	1	75	April 3, 2025

Inference API stopped working

Related topics