API Limits on Free Inference API

The documentation is rather vague on the limits of the Free Inference API and similarly vague what subscribing to a ‘Pro’ account would change in the limits.

Could somebody comment in their experience what the limits of the Inference API are? In particular

  • Does moving to Pro change the limit for the model size which can be used? (Free has a limit of 10GB)
  • Are there any hourly / monthly character (or token?) limits for queries or responses?
  • Is there any rate limiting (request per minute)?
  • Does Pro change anything regarding the time until a model is loaded / unloaded?

Bonus question:

  • Is there a way to use quantized models with Free Inference API?

Thanks!

13 Likes

I am interested on this too.

I’m interested on this topic too. Thanks to bring it up.

I’m interested on this topic too