Serverless Inference Endpoints

pattyd99 · February 12, 2024, 8:31pm

I see that serverless inference endpoints cost a minimum of $0.06/hr (charged by the minute). I can set a minimum autoscaling to 0, but the “automatic scale to zero” setting has a minimum of “after 15 minutes with no activity” threshold.

From what I understand, this means that a single hourly request to a “serverless” inference endpoint would result in a minimum of 15 minutes (+ runtime) of billing, or about $0.015/hr minimum for a single request. This also means that a single request every 15 minutes would result in a full hour of runtime (so $0.06).

Am I understanding this correctly? As far as a serverless option goes, this seems like a pretty bad option due to the lack of scaling to 0, and manual deployment in SageMaker would likely be cheaper for intermittent applications due to AWS flexibility around scale to zero.

Topic		Replies	Views
Is the price adjusted with autoscaling? Inference Endpoints on the Hub	0	927	September 29, 2022
Autoscaling is turned on to min replicas as 0. Yet costing money? Inference Endpoints on the Hub	2	508	August 11, 2023
Misunderstanding about inference endpoint billing Beginners	2	767	February 5, 2025
Pricing for Huggingface Endpoint Inference Endpoints on the Hub	6	3300	February 5, 2025
Autoscaling on inference endpoints not initializing from 0 replicas Inference Endpoints on the Hub	2	403	June 27, 2024

Serverless Inference Endpoints

Related topics