I would be glad if anyone can help me with this doubt
I wanted to know if I subscribe to HuggingFace Pro, what is the highest rate limit for inference API (serverless)?
I would be glad if anyone can help me with this doubt
I wanted to know if I subscribe to HuggingFace Pro, what is the highest rate limit for inference API (serverless)?
Hi @priyanshu26, The free Inference API (serverless) is our solution to easily explore and evaluate models, and is subject to rate-limiting. We don’t provide the rate limit numbers because they change with how much volume we get. PRO / Enterprise organization accounts will get priority.
For larger volumes of requests, or if you need guaranteed latency/performance, we recommend using Inference Endpoints (dedicated) to easily deploy your models on dedicated, fully-managed infrastructure. Inference Endpoints will give you the flexibility to quickly create endpoints on CPU or GPU resources, and is billed by compute uptime vs character usage. Further pricing information can be found here.