How Can I Understand the Exact Cost of My Inference API Requests?

seyfiz · April 16, 2025, 7:06am

Hi everyone,

I’ve been using the Inference API with only Hugging Face as the provider, and I’m having trouble understanding how to estimate and track the actual cost of my requests.

For example, when using a model like stabilityai/stable-diffusion-3.5-large:

What kind of hardware (GPU) is this model running on?
How can I see how long my individual requests are taking (in seconds)?
And how exactly is the billing calculated based on that?

More broadly, is there a way to:

See which models run on which hardware,
Get a rough idea of how much they cost per second,
And know the average inference time for each model?

I’m considering using the API in a commercial product, but the current lack of pricing transparency is holding me back. Knowing the per-second cost and expected runtime for each model would help a lot with planning and budgeting.

Would really appreciate it if someone from the team or community could clarify this or point me to relevant documentation.

Thanks in advance!

John6666 · April 16, 2025, 7:19am

There’s not much we users know… If you have any questions about payment: billing@huggingface.co
Inference API cost changed for meta-llama-3.3-70b? - #3 by meganariley
Pricing and Billing

seyfiz · April 16, 2025, 7:48am

Thanks for the info. I did reach out to billing@huggingface.co, but the response I got was pretty vague. They simply said the cost is calculated as hardware used × compute time, but there’s no transparency around which hardware is used for which model, nor any indication of average compute times.

This lack of clear documentation makes it really hard to predict or control costs — especially when there are so many models and even more custom Spaces available.

I really don’t want to move away from Hugging Face — the model variety and flexibility are unmatched. But it’s getting frustrating that even for commercial use, it’s impossible to get precise cost estimations or detailed info per model.

Would love to see more open communication or even just a basic public list of models → hardware → avg. runtime → estimated cost. That would help a lot.

Topic		Replies	Views
Unexpected 10x Increase in Inference API Costs Beginners	2	121	March 25, 2025
How do you track Hugging Face Inference API costs alongside other AI services? Show and Tell	0	34	May 6, 2025
HF Playground Incorrect Billing - Beginners	5	43	May 5, 2025
Serverless Inference API Beginners	1	445	September 16, 2024
Use hugging face models Models	1	125	April 24, 2025

How Can I Understand the Exact Cost of My Inference API Requests?

Related topics