Misunderstanding about inference endpoint billing

victochai · October 4, 2023, 7:39pm

Hi everyone !

I managed to deploy a model and linked it to my infrastructure. During tests, the model has been inferred with only 2 images. Response time was about 10-15s each but I have been billed for 1 minute 40 and more (had to pause the endpoint).
The documentation states exactly : "
Pay for compute resources uptime by the minute, billed monthly.

As low as $0.06 per CPU core/hr and $0.6 per GPU/hr.
"
which led me to assume that as long as the endpoint is not being used, I am not being billed. Am I wrong or did I miss something ?
Good evening

GoldenGen · October 23, 2024, 10:26pm

Hello! Did you manage to figure anything out? Same thing happening to me.

langhoangal · February 5, 2025, 12:36am

The same, After deployment, I tested with 9 request within few minutes. And later (after few hours) I saw the compute time is 46 minutes. I feel unsafe using this service.

Topic		Replies	Views
Pricing for Huggingface Endpoint Inference Endpoints on the Hub	6	3327	February 5, 2025
Autoscaling is turned on to min replicas as 0. Yet costing money? Inference Endpoints on the Hub	2	511	August 11, 2023
Charge when endpoint’s initializing Inference Endpoints on the Hub	0	170	March 23, 2024
Using Existing Models in Test Environment Beginners	1	17	May 23, 2025
Inference API Rate Limits Inference Endpoints on the Hub	1	87	May 16, 2025

Misunderstanding about inference endpoint billing

Related topics