Inference Endpoint - Simultaneous Generations taking a long time

I’ve deployed an endpoint based on this template on a GPU medium: philschmid/ControlNet-endpoint · Hugging Face

It’s working great when I test a single image generation, but when I test 3 or 4, one right after another, the generation time goes from 9sec to over 60sec. Just wondering if there is something I’m missing or something I need to add, so it handles load better.

1 Like