Inference Endpoint - Simultaneous Generations taking a long time

mckinley32 · March 14, 2023, 3:55pm

I’ve deployed an endpoint based on this template on a GPU medium: philschmid/ControlNet-endpoint · Hugging Face

It’s working great when I test a single image generation, but when I test 3 or 4, one right after another, the generation time goes from 9sec to over 60sec. Just wondering if there is something I’m missing or something I need to add, so it handles load better.

Topic		Replies	Views
Inference Endpoints API slow generating images Inference Endpoints on the Hub	1	107	November 22, 2024
Misunderstanding about inference endpoint billing Beginners	2	767	February 5, 2025
When I try to inference on multiple GPUs using multiple processes, the time for model. generate() becomes very long 🤗Transformers	0	474	June 12, 2023
About the Inference Endpoints on the Hub category Inference Endpoints on the Hub	3	1648	May 8, 2025
Continuous execution lead to decreasing inference time Beginners	0	17	October 28, 2024

Inference Endpoint - Simultaneous Generations taking a long time

Related topics