Inference benchmark (vllm with nginx)

mhamiri96 · April 16, 2025, 6:42pm

Hi, everyone. I use inference benchmarker (GitHub - huggingface/inference-benchmarker: Inference server benchmarking tool) and when I use multiple vllm containers with nginx as load balancer, I don’t get the #containers * (one vllm container throughput). Should I consider something else?

John6666 · April 17, 2025, 4:15am

It seems to support normal use of vllm, so I think it will be implemented if you raise an issue.

Topic		Replies	Views
Serving AWQ models without a custom container Inference Endpoints on the Hub	2	240	November 13, 2023
How to deploy fine-tuned llava model with Huggingface Inference and using vLLM? Inference Endpoints on the Hub	0	216	July 15, 2024
Stuck starting inference model Inference Endpoints on the Hub	6	2320	November 23, 2023
The fastest LLM inference on the server Research	0	424	August 8, 2024
Inference speed Spaces	0	374	September 17, 2023