Hello,
I have an endpoint and my team and I are sending requests to the same endpoint and it ends up crashing our individual runs. Do endpoints allow for concurrent queries and, if so, are there any changes done in the setup creation of the endpoint?
Thanks!
1 Like
Hey @alckasoc! Are you using autoscaling at all with your endpoints? The default scaling of endpoints is based on hardware utilization with 80% threshold. It might be helpful to look at lowering the threshold, or switch to pending requests, and increasing the max replicas above 1. Please feel free to check out Autoscaling for more information and let us know how it goes!
Hi @meganariley , with each replica, does the price per hour increase? What if I’m at the quota limit for the hardware? I think right now I’m using 8 x A100s and I’m at the quota limit.