429 Errors and Model Overloaded - Dedicated Endpoint

jamie-de · October 22, 2024, 7:34pm

Hi! We have a model that we hit 429 errors on and “Model overloaded” errors. What are the rate limits for a dedicated endpoint? I can’t seem to find that information anywhere. We are using an instance size of:

GPU · Nvidia L4 · 1x GPU · 24 GB

Topic		Replies	Views
Help with dedicated endpoints Inference Endpoints on the Hub	0	165	May 13, 2024
Cannot run dedicated server for interface endpoint Inference Endpoints on the Hub	1	30	November 23, 2024
Bad request error when using inference endpoints: Cannot find backend for CPU Inference Endpoints on the Hub	0	149	June 16, 2024
Model overloaded! Models	2	131	February 11, 2025
Dedicated endpoint getting 429 errors Intermediate	4	359	May 21, 2025

429 Errors and Model Overloaded - Dedicated Endpoint

Related topics