Inference API down?

While accessing this(speechbrain/lang-id-voxlingua107-ecapa · Hugging Face) model via Inference API, I am getting the following error -

(MaxRetryError(‘HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/speechbrain/lang-id-voxlingua107-ecapa (Caused by NameResolutionError(“<urllib3.connection.HTTPSConnection object at 0x7f5306290cd0>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)”))’), ‘(Request ID: d978f641-257c-45c4-b95b-c51865344dfe)’)

Can someone provide more insight into this error? And how do we solve it?

I’m also facing issues with Inference API all of a sudden

same here.

We are also facing issues with the Inference Endpoints

Same for hours.:sweat:

Thanks for reporting, issue should be fixed now.

Fixed. Thank you.:blush:

@nielsr Does fixed mean that it is now a 500 internal server error? I am currently facing this error with all 3 providers and multiple models.

@nielsr I have debugged the problem further. The endpoints work as long as they are public. Both with and without scale to zero. If I secure the endpoint and request it without a token, a 401 is returned. So far so good. But if I pass a valid token, I get a 500. Do your integration tests work?

@nielsr Ok. This is really weird now. For 2 hours I got 401 from the UI creating new endpoints and deleting existing ones (which costed me 12$) or even showing existing instances. Now the instance is visible again. And the Endpoint ist working with toking. So I got a last question: Are you fixing things in production without customer feedback and what kind of availability and stability can I expect from dedicated endpoints? Are they ready for production (>99,9% availability)?

Hi,

Yes they should be ready for production (they aim to make putting ML models in production easier with a few clicks). I appreciate your feedback, I’m not part of the Inference Endpoints team but will forward your feedback to them.

1 Like

The APIs for the evaluate library are also down for five days.

1 Like

Just go here and see the runtime errors: evaluate-metric (Evaluate Metric)

1 Like