Inference benchmark (vllm with nginx)

Hi, everyone. I use inference benchmarker (GitHub - huggingface/inference-benchmarker: Inference server benchmarking tool) and when I use multiple vllm containers with nginx as load balancer, I don’t get the #containers * (one vllm container throughput). Should I consider something else?

1 Like

It seems to support normal use of vllm, so I think it will be implemented if you raise an issue.