Question about Hugging face inference API

priyanshu26 · April 30, 2024, 7:37pm

I would be glad if anyone can help me with this doubt

I wanted to know if I subscribe to HuggingFace Pro, what is the highest rate limit for inference API (serverless)?

michellehbn · May 6, 2024, 12:40pm

Hi @priyanshu26, The free Inference API (serverless) is our solution to easily explore and evaluate models, and is subject to rate-limiting. We don’t provide the rate limit numbers because they change with how much volume we get. PRO / Enterprise organization accounts will get priority.

For larger volumes of requests, or if you need guaranteed latency/performance, we recommend using Inference Endpoints (dedicated) to easily deploy your models on dedicated, fully-managed infrastructure. Inference Endpoints will give you the flexibility to quickly create endpoints on CPU or GPU resources, and is billed by compute uptime vs character usage. Further pricing information can be found here.

Topic		Replies	Views
What are the Rate Limits For the Inference API Beginners	0	875	July 10, 2024
Use hugging face models Models	1	135	April 24, 2025
Need help for Infernece API rate limiting Beginners	0	307	May 26, 2024
Serverless Inference API credits Beginners	2	115	May 19, 2025
Serverless Inference API Beginners	1	470	September 16, 2024

Question about Hugging face inference API

Related topics