Sorry if I missed this in the documentation.
Are user requests processed in parallel or sequential?
Many thanks in advance for your time.
Sorry if I missed this in the documentation.
Are user requests processed in parallel or sequential?
Many thanks in advance for your time.
Here are some key points to consider:
In summary, the processing of user requests will depend on the hardware resources and web server implementation.
P.S. For paid clients, we offer Spaces replicas with round-robin load balancing. Essentially, your Space is duplicated, and a load balancer distributes the requests across the replicas.
Hi @radames
thank you for the speedy response - this is super useful. Wow an amazing service. Out of interest, why do you call them demos as it appears you can scale to many thousands of users?
hi @harrycoppock for production inference we offer a more robust, autoscaling and dedicated infrastructure manage by us.