Hi! We have a model that we hit 429 errors on and “Model overloaded” errors. What are the rate limits for a dedicated endpoint? I can’t seem to find that information anywhere. We are using an instance size of:
GPU · Nvidia L4 · 1x GPU · 24 GB
Hi! We have a model that we hit 429 errors on and “Model overloaded” errors. What are the rate limits for a dedicated endpoint? I can’t seem to find that information anywhere. We are using an instance size of:
GPU · Nvidia L4 · 1x GPU · 24 GB