API Limits on Free Inference API

oezbek · October 7, 2023, 12:12pm

The documentation is rather vague on the limits of the Free Inference API and similarly vague what subscribing to a ‘Pro’ account would change in the limits.

Could somebody comment in their experience what the limits of the Inference API are? In particular

Does moving to Pro change the limit for the model size which can be used? (Free has a limit of 10GB)
Are there any hourly / monthly character (or token?) limits for queries or responses?
Is there any rate limiting (request per minute)?
Does Pro change anything regarding the time until a model is loaded / unloaded?

Bonus question:

Is there a way to use quantized models with Free Inference API?

Thanks!

wolframsolutions · December 6, 2023, 2:28pm

I am interested on this too.

tomkat-cr · March 3, 2024, 10:26am

I’m interested on this topic too. Thanks to bring it up.

npires · April 24, 2024, 7:41pm

I’m interested on this topic too

drodin · May 28, 2024, 1:40pm

Still no clear information about this. Any HF members could clarify, please?

ChriSxFire · May 30, 2024, 10:04am

Did anyone figure this out?

GuusBouwensNL · June 3, 2024, 9:27am

Also running into api limit rates with the PRO subscription. Would be helpful if there were stats on how to navigate the limits.

woq · July 28, 2024, 8:33am

Hi @GuusBouwensNL , can you tell what api rate limit are you able to achieve with the PRO plan?
After how many requests are you hitting the rate limit.

ldaniel-jmz · August 6, 2024, 4:24am

Interested on this too

laygir · August 7, 2024, 11:06am

Also arrived here with the same interest.

myonlyeye · August 22, 2024, 9:59am

I’m also interested in the answer to these questions!

myonlyeye · August 22, 2024, 10:01am

Ok… found it. It’s flexible…

heshiweij · September 15, 2024, 2:40am

Still not clear.

John6666 · September 15, 2024, 3:58am

It is still vague and fluid at all for Pro account users.
In Pro status, the restrictions are relaxed. I think it’s being done, so it’s being done. That’s all I can say.
Plus, unlike in 2023, even the smaller models don’t work as well now. The Free Serverless Inference API and widgets for all but the very best models is virtually obsolete now.

Is there a way to use quantized models with Free Inference API?

So far, this is still not possible. Now that quantization is becoming more and more commonplace and there is particularly no need not to do it, HF will have to deal with it eventually…

Topic		Replies	Views
Question about Hugging face inference API Beginners	1	1948	May 6, 2024
PRO Plan and for running huge models on free inference api? Beginners	1	1814	May 15, 2023
Unlimited API usage for models Beginners	4	5736	May 7, 2021
What are the Rate Limits For the Inference API Beginners	0	926	July 10, 2024
API inference limit changed? Intermediate	9	2869	July 21, 2025

API Limits on Free Inference API

Related topics