Hi, everyone. I use inference benchmarker (GitHub - huggingface/inference-benchmarker: Inference server benchmarking tool) and when I use multiple vllm containers with nginx as load balancer, I don’t get the #containers * (one vllm container throughput). Should I consider something else?
1 Like
It seems to support normal use of vllm, so I think it will be implemented if you raise an issue.