Inference API stopped working

Mage has actually been around for a few years now, at least 2-3, if not 4 years. I discovered it from a reddit post detailing various websites that had integrated AI image generation in some capacity. As far as features go, the following screenshots list the various plans for Mage and their features:



I currently use the Pro plan, and while it is pricy, the amount of features and benefits you have access to make it well worth it. I personally like some of the exclusive models, such as the Mage-exclusive Illustrious/NoobAI model MagnoliaMix, which helped me produce this image, which was enhanced right within Mage:

Until HF Staff give us a clear, concise answer on what is happening, I will continue to use https://www.mage.space/.

3 Likes

Any updates on this issue?
I’m using the JS InferenceClient with the sentence-transformers/all-mpnet-base-v2 model and still getting an error. It was working fine 24hrs ago and now it stopped working.
I also tried other models like: sentence-transformers/all-MiniLM-L6-v2 and jinaai/jina-embeddings-v3 They also fail

1 Like

For those unable to use HF, go to Mistral and get a free account to get an API key. Then use this class, it will simuate the results getting back from InferenceClient when you use chat_stream().


import os
import config
from mistralai import Mistral

class TextPacket:
    def __init__(self):
        self.choices = []

class TextMessage:
    def __init__(self):
        self.role:str = None
        self.content:str = None

class TextGroup:
    def __init__(self):
        self.index = 0
        self.finish_reason:str = None
        self.delta:TextMessage = TextMessage()
        self.message:TextMessage = TextMessage()

class MistralGenerator():
    def __init__(self):
        self.api_key = config.MISTRALAI_APIKEY
        self.model = "mistral-small-latest"
        self.client = Mistral(api_key=self.api_key)

    def chat_complete(self, query, max_tokens=512, temperature=0.7, top_p=0.9):
        chat_response = self.client.chat.complete(
            model= self.model,
            messages = [
                {
                    "role": "user",
                    "content": query,
                },
            ],
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
        )
        print(chat_response.choices[0].message.content)
        return chat_response.choices[0].message.content

    def chat_stream(self, messages, max_tokens=512, temperature=0.7, top_p=0.9):
        stream_response = self.client.chat.stream(
            model=self.model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            stream=True
        )

        for chunk in stream_response:
            #print(chunk.data.choices[0].delta.content)
            message = TextPacket()
            group = TextGroup()
            group.index = 0
            group.delta.role = "assistant"
            group.delta.content = chunk.data.choices[0].delta.content
            message.choices.append(group)
            yield message

        # Final stop message for stream
        message = TextPacket()
        group = TextGroup()
        group.index = 0
        group.delta.role = "assistant"
        group.delta.content = ""
        group.delta.finish_reason = "stop"
        message.choices.append(group)
        yield message

1 Like

No progress, but a new issue arose yesterday or today…

The above error seems to have been fixed directly, but the server seems to be malfunctioning today.

The question now is why did it happen? And how can it be prevented.

1 Like

Current status: none of the text-2-image models work anymore - no matter if I clone a model to my account or if I use any of the big players (yntec / digiplay)

I’m guessing that’s a side effect from the 404 error; I played around a bit with PHP and run into the 404 brickwall everytime I tried to use anything beside “stabilityai/stable-diffusion-xl-base-1.0” - that one even works on the testground!

I slapped a little testground together here:

The sad thing: I don’t see any error. The build runs fine, no errors - but if I hit “generate”: nothing (not even an error!!). This was working on 25-04-06 - some time after that, HF had the glorious idea to “improve” something that broke all image generation.

HF staff: if this only works on a payed account: fine! Tell me and I’m game. But more than 3 weeks without any real feedback is just… bad.

I added a little debug to the build process: these are the (py)-modules the space loads:

certifi==2025.4.26
fsspec==2025.3.0
pytz==2025.2
tzdata==2025.2
setuptools==65.5.1
cryptography==44.0.2
attrs==25.3.0
pip==25.1
packaging==25.0
aiofiles==24.1.0
pyarrow==20.0.0
websockets==15.0.1
rich==14.0.0
pillow==11.2.1
click==8.0.4
multidict==6.4.3
PyYAML==6.0.2
gradio==5.27.1
psutil==5.9.8
async-timeout==5.0.1
tqdm==4.67.1
typing-extensions==4.13.2
anyio==4.9.0
protobuf==3.20.3
filelock==3.18.0
aiohttp==3.11.18
orjson==3.10.16
idna==3.10
datasets==3.5.1
xxhash==3.5.0
charset-normalizer==3.4.1
jinja2==3.1.6
MarkupSafe==3.0.2
markdown-it-py==3.0.0
pydantic-core==2.33.1
requests==2.32.3
pycparser==2.22
pygments==2.19.1
pydantic==2.11.3
semantic-version==2.10.0
python-dateutil==2.9.0.post0
aiohappyeyeballs==2.6.1
urllib3==2.4.0
numpy==2.2.5
pandas==2.2.3
itsdangerous==2.2.0
yarl==1.20.0
cffi==1.17.1
six==1.17.0
gradio-client==1.9.1
frozenlist==1.6.0
shellingham==1.5.4
authlib==1.5.2
aiosignal==1.3.2
sniffio==1.3.1
exceptiongroup==1.2.2
httpcore==1.0.9
hf-xet==1.0.5
fastapi==0.115.12
multiprocess==0.70.16
starlette==0.46.2
wheel==0.45.1
spaces==0.35.0
uvicorn==0.34.2
huggingface-hub==0.30.2
httpx==0.28.1
pydub==0.25.1
h11==0.16.0
typer==0.15.3
tomlkit==0.13.2
ruff==0.11.7
annotated-types==0.7.0
ffmpy==0.5.0
typing-inspection==0.4.0
dill==0.3.8
propcache==0.3.1
hf-transfer==0.1.9
safehttpx==0.1.6
groovy==0.1.2
mdurl==0.1.2
python-multipart==0.0.20

Yes - if I get this working, I drop all these into requirements and hopefully never have to hunt for errors I didn’t cause ever again.

Very frustrating.

2 Likes

why did it happen?

The cause of this large-scale outage may be hardware replacement. I think it happened when the A100 in the Zero GPU space was replaced with H200. Probably other services as well?

In the long term, I think Hugging Face was unable to handle the excessive number of Inference API requests as a company. julien-c mentioned something to that effect somewhere on Hub.

Both are just speculation.

1 Like

The implementation of InferenceClient itself has changed significantly…
The implementation of Gradio’s external.py has also changed to use InferenceClient in the new version.

That being said, after recovering from a large-scale failure, even SmolLM2 is not deployed at this point.

1 Like

https://api-inference.huggingface.co/google/flan-t5-base

500 (INTERNAL SERVER ERROR)
interview.component.ts:174 Error mendapatkan feedback: Hugging Face API request failed with status code: 401

error @ interview.component.ts:174
Zone - XMLHttpRequest.addEventListener:load
sendAnswerAndGetFeedback @ interview.component.ts:148
(anonymous) @ interview.component.ts:238
Zone - setTimeout
toggleRecording @ interview.component.ts:224
InterviewComponent_Template_button_click_12_listener @ interview.component.html:49
Zone - HTMLButtonElement.addEventListener:click
InterviewComponent_Template @ interview.component.html:49
Zone - Promise.then
(anonymous)

why when i try to use api for the models google-flan got this error yep?

1 Like

Not deployed to HF API, it seems…

but yesterday i use google flan-t5-large it can, and now cant…

1 Like

I don’t understand what’s going on…:innocent:

Hi, I think I have the same problem.
I’ve published a model Eddy872/zoove-t5, and everything seems correctly configured:

  • The repository is public
  • I’ve added the proper metadata at the top of the README.md
  • The model works perfectly with the pipeline() method in Python
    However, the Inference API still returns a 404 error when calling
    I also tried using the InferenceApi() client in Python with raw_response=True, and still got a 404.

Is there anything else I need to do to trigger the activation of the hosted Inference API for my model?

Thanks in advance for your help!

2 Likes

and still got a 404

I think almost all individual users are in that state right now…
Or rather, even the models of well-known companies are showing a 404 error.

2 Likes

Same problem appears today for me with my personal text generation model.

  • Originally, I used old (2.8.1) version of the @huggingface/inference JS package - “Error fetching from Hugging Face API: An error occurred while fetching the blob” while trying to send text input.
  • Updated the package to the last version (3.12.1) - the error have changed - “No Inference Provider available for model …”
  • Turn back to the old version, tried to specified the blob format (as this solution earlier worked with whisper audio during this API hell of the month) - “Unexpected token ‘N’, “Not Found” is not valid JSON”
  • Tried to send raw HTTP request - Error 404: Model not found.

Yeah, something is going very wrong since 15 of April. Probably, the only solutions are just wait or switch to a different platform (which is not the case for my friends-only model experiments, so I just believe)

2 Likes

same for me when using whisper large v3 model using provider=“hf-inference”
it wont work it

1 Like

Anyone has a solution for this? I am presenting my final year project in 2 days and have a model hosted on HF and all of a sudden I can not use it anymore. I would greatly appreciate any help thank you!

1 Like

Hmm… Is this a different problem from the previous one? @michellehbn

Whisper is working fine for me, but I use large-v3-turbo without any provider specified. However there were a lot of issues with it last month: Inference API error with Whisper, return_timestamps parameter

1 Like