How can I make concurrent image generation using sdxl , How could i handle

I am using A100 GPU,

  1. When I have concurrent requests to my API , firstly all the requests are in the queue , secondly after the image completes its generation , its going to stack and waiting for all other concurrent requests to complete and sending responses only after all images generates and sending in respones like first in last out?

  2. How could I handle this atleast 100 - 200 requests in 1 minute using A100

Please help me out

Thank you

We’re not a server-side library such as TorchServe or TensorFlow Extended that provide optimized support for such things. So, you will have to implement those things yourself.

1 Like