Is the serverless API completely broken and unreliable?

I couldn’t find any real info anywhere so I subscribed with PRO to test whether I could use Llama 3.3 70b for my (small) app since the API seemed fine with mistral nemo.

Unfortunately, I get garbage responses about half the time.

Example:

07 every 08: this is low10 an08: this is not a good 07/1.0780780:00 to 1.irectional 07: this is 08:0780780: this is boot1: this is 0780:0000: this is 07:00:0780780780: this is 07: this is 07:00: this is 08: this is 08: this is 1: this is 07: this is 07:00: this is 07: this is 07:0000: this is 07:00: this is 01 i 078: this is 08011: 08:000000:08080780:0000 is 081: 0000: this is 08:0780780: this is 01: this is 07:00:00:0780780780: this is 0780: this is 07:0:10000: this is 01:079079000: this is 07:1:0780:00:00000780: this is 07:0780:0:00:00: this is 07:00: this is 08: this is0780: this is 01: this is 00:00: this is 07:00:00: this is 07:00:078

I’ve tried every single parameter with/without. Tested other models too. Qwen 72b is also broken. Small models work… Again, its not an issue with my code since it works great SOME of the time.

Oh also, a few models like gemma simply straight out never work from the API (model too busy;) although they answer instantly from playground.

1 Like

Models that exceed 10GB in total will not be loaded into the Serverless Inference API unless Hugging Face explicitly allows it. Also, due to a lack of GPU resources, the specifications changed a few months ago, and the Serverless Inference API will not be turned on unless the model uploaded by an individual becomes quite famous.
There are quite a few cases where it can be used if it is explicitly loaded from the Playground.
Llama 3.3 seems to be supported… but there is a possibility that the settings are broken…:sweat_smile:

It looks fine from Hugging Chat.

There were two reports on HF Discord that the Serverless Inference API in Llama 3.3 was not working properly. In other words, it seems that there is something wrong with the server-side settings.

1 Like

Hi everyone! The issue should be fixed now. Let us know if it happens again!

2 Likes

Great! Maybe because it’s the first Monday of the year!

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.