API Limits on Free Inference API

The documentation is rather vague on the limits of the Free Inference API and similarly vague what subscribing to a ‘Pro’ account would change in the limits.

Could somebody comment in their experience what the limits of the Inference API are? In particular

  • Does moving to Pro change the limit for the model size which can be used? (Free has a limit of 10GB)
  • Are there any hourly / monthly character (or token?) limits for queries or responses?
  • Is there any rate limiting (request per minute)?
  • Does Pro change anything regarding the time until a model is loaded / unloaded?

Bonus question:

  • Is there a way to use quantized models with Free Inference API?

Thanks!

19 Likes

I am interested on this too.

I’m interested on this topic too. Thanks to bring it up.

I’m interested on this topic too

Still no clear information about this. Any HF members could clarify, please?

Did anyone figure this out?

Also running into api limit rates with the PRO subscription. Would be helpful if there were stats on how to navigate the limits.