Sorry if I missed this in the documentation.
Are user requests processed in parallel or sequential?
Many thanks in advance for your time.
Sorry if I missed this in the documentation.
Are user requests processed in parallel or sequential?
Many thanks in advance for your time.
Here are some key points to consider:
In summary, the processing of user requests will depend on the hardware resources and web server implementation.
Hi @radames
thank you for the speedy response - this is super useful. Wow an amazing service. Out of interest, why do you call them demos as it appears you can scale to many thousands of users?
hi @harrycoppock for production inference we offer a more robust, autoscaling and dedicated infrastructure manage by us.
Hi @radames – Has spaces replicas been discontinued? I’m a pro user and I don’t have access to this. Did you mean the enterprise clients? Also, how does this work with the persistent storages? Do replicas share a common volume? All the best!
hi @akgunomerfaruk , please contact api-enterprise@huggingface.co for individual scaling requests, and the persistent storage is shared across replicas. Thanks for your interest .
Thank you @radames for explanations. One more dumb question from me, the price will increase linearly with the number of GPUs that are added? For example if I need one more GPU because I have (predefined) too many request it will just add the price of that GPU and no extra cost?
sorry, @coralexbadea could you also please reach out to api-enterprise@huggingface.co? Thanks