Hi everyone,
I’ve been using the Inference API with only Hugging Face as the provider, and I’m having trouble understanding how to estimate and track the actual cost of my requests.
For example, when using a model like stabilityai/stable-diffusion-3.5-large
:
- What kind of hardware (GPU) is this model running on?
- How can I see how long my individual requests are taking (in seconds)?
- And how exactly is the billing calculated based on that?
More broadly, is there a way to:
- See which models run on which hardware,
- Get a rough idea of how much they cost per second,
- And know the average inference time for each model?
I’m considering using the API in a commercial product, but the current lack of pricing transparency is holding me back. Knowing the per-second cost and expected runtime for each model would help a lot with planning and budgeting.
Would really appreciate it if someone from the team or community could clarify this or point me to relevant documentation.
Thanks in advance!