How many user requests can spaces process in parrallel?

Sorry if I missed this in the documentation.

Are user requests processed in parallel or sequential?

Many thanks in advance for your time.

Hi @harrycoppock

Here are some key points to consider:

In summary, the processing of user requests will depend on the hardware resources and web server implementation.

P.S. For paid clients, we offer Spaces replicas with round-robin load balancing. Essentially, your Space is duplicated, and a load balancer distributes the requests across the replicas.

1 Like

Hi @radames

thank you for the speedy response - this is super useful. Wow an amazing service. Out of interest, why do you call them demos as it appears you can scale to many thousands of users?

hi @harrycoppock for production inference we offer a more robust, autoscaling and dedicated infrastructure manage by us.

Hi @radames – Has spaces replicas been discontinued? I’m a pro user and I don’t have access to this. Did you mean the enterprise clients? Also, how does this work with the persistent storages? Do replicas share a common volume? All the best!

hi @akgunomerfaruk , please contact api-enterprise@huggingface.co for individual scaling requests, and the persistent storage is shared across replicas. Thanks for your interest .

Thank you @radames for explanations. One more dumb question from me, the price will increase linearly with the number of GPUs that are added? For example if I need one more GPU because I have (predefined) too many request it will just add the price of that GPU and no extra cost?

sorry, @coralexbadea could you also please reach out to api-enterprise@huggingface.co? Thanks