API Limits on Free Inference API

The documentation is rather vague on the limits of the Free Inference API and similarly vague what subscribing to a ‘Pro’ account would change in the limits.

Could somebody comment in their experience what the limits of the Inference API are? In particular

  • Does moving to Pro change the limit for the model size which can be used? (Free has a limit of 10GB)
  • Are there any hourly / monthly character (or token?) limits for queries or responses?
  • Is there any rate limiting (request per minute)?
  • Does Pro change anything regarding the time until a model is loaded / unloaded?

Bonus question:

  • Is there a way to use quantized models with Free Inference API?

Thanks!

21 Likes

I am interested on this too.

I’m interested on this topic too. Thanks to bring it up.

I’m interested on this topic too

Still no clear information about this. Any HF members could clarify, please?

1 Like

Did anyone figure this out?

Also running into api limit rates with the PRO subscription. Would be helpful if there were stats on how to navigate the limits.

Hi @GuusBouwensNL , can you tell what api rate limit are you able to achieve with the PRO plan?
After how many requests are you hitting the rate limit.

Interested on this too

Also arrived here with the same interest.

I’m also interested in the answer to these questions!

Ok… found it. It’s flexible…

Still not clear.

It is still vague and fluid at all for Pro account users.
In Pro status, the restrictions are relaxed. I think it’s being done, so it’s being done. That’s all I can say.
Plus, unlike in 2023, even the smaller models don’t work as well now. The Free Serverless Inference API and widgets for all but the very best models is virtually obsolete now.

Is there a way to use quantized models with Free Inference API?

So far, this is still not possible. Now that quantization is becoming more and more commonplace and there is particularly no need not to do it, HF will have to deal with it eventually…