429 Errors and Model Overloaded - Dedicated Endpoint

Hi! We have a model that we hit 429 errors on and “Model overloaded” errors. What are the rate limits for a dedicated endpoint? I can’t seem to find that information anywhere. We are using an instance size of:

GPU · Nvidia L4 · 1x GPU · 24 GB
1 Like