Serverless Inference API

John6666 · September 16, 2024, 9:46pm

Is the Serverless Inference API basically my own LLM engine?

When it works stably, you can say it does, but none of the users know the conditions under which it works stably, and there is no explanation anywhere. There are no guidelines. The only way is to measure it yourself, as in a natural science class.

And if I pay $10 a month to HuggingFace, I get 300 queries per hour?

The Pro subscription allows for relatively stable and regular use of Llama 70B, for example, but again, there is no numerical guide as to exactly how much it can be used. Even if we did measure it, maybe it will change tomorrow…

In general, think of the Pro subscription as a service that is somehow more comfortable but to what extent no one knows, although apparently $20 Enterprise is also like that.
I’m also a subscriber, and the Zero GPU space is useful, though buggy.

P.S.

If you have a question about Zero GPU, there is a dedicated community on HF, so you can be sure to ask there, but there is no stable place to ask about the Serverless Inference API.
There is a github for extending the functionality, but the issue of server limitations is probably outside their expertise.

Topic		Replies	Views
Question about Hugging face inference API Beginners	1	1885	May 6, 2024
Use hugging face models Models	1	135	April 24, 2025
To use Llama3.1-405b do I have to rent a server, or can I send my API requests to someone else's server and pay them thrrough HF? Beginners	1	104	August 30, 2024
Serverless Inference API credits Beginners	2	115	May 19, 2025
How Huggingface pricing works for model Deployment? Inference Endpoints on the Hub	2	3361	October 20, 2023

Serverless Inference API

P.S.

Related topics