Serverless Inference API

I saw your post in Reddit–but I have been banned for WrongThink and samizdat. Apparently, Google pulled some strings and got me banned from Reddit because I was calling out their paid shills in the Gemini subreddit—and how atrocious of a model they made it. But I digress.

As such, I ask my question here:

Is the Serverless Inference API basically my own LLM engine? And if I pay $10 a month to HuggingFace, I get 300 queries per hour? And because it’s an API…I just hook it up to my favorite front end? e.g. OpenWeb UI?

I know there are many use cases…but would testing out models before I download them be a good use case?

And…does this basically replace the $7,000 AI rig I built in my home office?

Thank you!

Is the Serverless Inference API basically my own LLM engine?

When it works stably, you can say it does, but none of the users know the conditions under which it works stably, and there is no explanation anywhere. There are no guidelines. The only way is to measure it yourself, as in a natural science class.

And if I pay $10 a month to HuggingFace, I get 300 queries per hour?

The Pro subscription allows for relatively stable and regular use of Llama 70B, for example, but again, there is no numerical guide as to exactly how much it can be used. Even if we did measure it, maybe it will change tomorrow…

In general, think of the Pro subscription as a service that is somehow more comfortable but to what extent no one knows, although apparently $20 Enterprise is also like that.
I’m also a subscriber, and the Zero GPU space is useful, though buggy.

P.S.

If you have a question about Zero GPU, there is a dedicated community on HF, so you can be sure to ask there, but there is no stable place to ask about the Serverless Inference API.
There is a github for extending the functionality, but the issue of server limitations is probably outside their expertise.