I see that serverless inference endpoints cost a minimum of $0.06/hr (charged by the minute). I can set a minimum autoscaling to 0, but the “automatic scale to zero” setting has a minimum of “after 15 minutes with no activity” threshold.
From what I understand, this means that a single hourly request to a “serverless” inference endpoint would result in a minimum of 15 minutes (+ runtime) of billing, or about $0.015/hr minimum for a single request. This also means that a single request every 15 minutes would result in a full hour of runtime (so $0.06).
Am I understanding this correctly? As far as a serverless option goes, this seems like a pretty bad option due to the lack of scaling to 0, and manual deployment in SageMaker would likely be cheaper for intermittent applications due to AWS flexibility around scale to zero.