Is the serverless API completely broken and unreliable?

inmyprocess · January 3, 2025, 6:35pm

I couldn’t find any real info anywhere so I subscribed with PRO to test whether I could use Llama 3.3 70b for my (small) app since the API seemed fine with mistral nemo.

Unfortunately, I get garbage responses about half the time.

Example:

07 every 08: this is low10 an08: this is not a good 07/1.0780780:00 to 1.irectional 07: this is 08:0780780: this is boot1: this is 0780:0000: this is 07:00:0780780780: this is 07: this is 07:00: this is 08: this is 08: this is 1: this is 07: this is 07:00: this is 07: this is 07:0000: this is 07:00: this is 01 i 078: this is 08011: 08:000000:08080780:0000 is 081: 0000: this is 08:0780780: this is 01: this is 07:00:00:0780780780: this is 0780: this is 07:0:10000: this is 01:079079000: this is 07:1:0780:00:00000780: this is 07:0780:0:00:00: this is 07:00: this is 08: this is0780: this is 01: this is 00:00: this is 07:00:00: this is 07:00:078

I’ve tried every single parameter with/without. Tested other models too. Qwen 72b is also broken. Small models work… Again, its not an issue with my code since it works great SOME of the time.

Oh also, a few models like gemma simply straight out never work from the API (model too busy;) although they answer instantly from playground.

John6666 · January 4, 2025, 2:40am

Models that exceed 10GB in total will not be loaded into the Serverless Inference API unless Hugging Face explicitly allows it. Also, due to a lack of GPU resources, the specifications changed a few months ago, and the Serverless Inference API will not be turned on unless the model uploaded by an individual becomes quite famous.
There are quite a few cases where it can be used if it is explicitly loaded from the Playground.
Llama 3.3 seems to be supported… but there is a possibility that the settings are broken…

It looks fine from Hugging Chat.

John6666 · January 4, 2025, 8:00am

There were two reports on HF Discord that the Serverless Inference API in Llama 3.3 was not working properly. In other words, it seems that there is something wrong with the server-side settings.

nsarrazin · January 6, 2025, 2:13pm

Hi everyone! The issue should be fixed now. Let us know if it happens again!

John6666 · January 6, 2025, 2:27pm

Great! Maybe because it’s the first Monday of the year!

system · January 7, 2025, 6:47am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Meta-llama / Meta-Llama-3-70B-Instruct is not available as a serverless API Models	10	1615	September 28, 2024
Inference API returns 504 error for Llama-3.2-3B-Instruct & google/gemma-2-2b-it Inference Endpoints on the Hub	3	33	April 21, 2025
Constant 503 error for several days when running LLAMA 3.1 Inference Endpoints on the Hub	5	330	April 25, 2025
Help using inference endpoint with Llama 3.1 405B Instruct Inference Endpoints on the Hub	1	166	August 30, 2024
Getting "502 Server Error: Bad Gateway for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-3B-Instruct" error 🤗Hub	8	240	April 28, 2025

Is the serverless API completely broken and unreliable?

Related topics