Inference Providers: 3 cents per request?

Wexly · February 22, 2025, 7:07pm

Basically this is the question: the posts about using the Inference Providers seem to indicate that the price is the same as using the providers directly. But when I run a small sample with one word “Hello” it seems that $0.03 is deducted from my account. The provider’s pricing for this model says “$0.10 / M tokens” which is a far cry from $0.03 per a couple of tokens to send and receive “hello”.
Also the amount does not seem to depend on the actual amount of tokens: if I attach a 1Mb image I still get charged the same 3 cents.

So the question is if this is a bug in billing or if there is a charge of $0.03 per request? The 0.03/request seems to add up pretty fast…

This seems to be the case for all the providers I tried. Here’s my code in case I’m doing something wrong:

import os
from huggingface_hub import InferenceClient
import base64


model_name= "Qwen/Qwen2-VL-7B-Instruct"

client = InferenceClient(
    model=model_name,
    #provider="fireworks-ai",
    provider="hyperbolic",
    #provider="nebius",
    api_key=os.environ['HF_TOKEN']
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Hello"""
            }
        ]
    }
]


stream = client.chat.completions.create(
    #model=model_name,
    messages=messages, 
    max_tokens=500,
    #temperature=0.0,
    #stream=False
)


print([x.message.content for x in stream.choices])

John6666 · February 23, 2025, 3:00am

@meganariley Question about pricing.

julien-c · February 28, 2025, 5:08pm

Hi @wexly,

we’re not billing inference providers usage yet (it’s only free included credits), so we are using imperfect approximation heuristics for some of the providers.

We will be shipping accurate pricing and it will go-live in the next week. I’ll post here when it’s live.

Wexly · March 4, 2025, 7:16pm

I see, thank you very much, I was excited about this feature but got a bit scared when I saw the numbers … Sorry for a false alarm, looking forward to use this feature!

John6666 · March 12, 2025, 10:05am

Updated.

Topic		Replies	Views
HF Playground Incorrect Billing - Beginners	5	44	May 5, 2025
Use hugging face models Models	1	127	April 24, 2025
Inference Provider Pricing Beginners	1	97	March 24, 2025
Pro Account $2 inference limit Beginners	8	1056	March 23, 2025
Hugging face inference support and quota Inference Endpoints on the Hub	3	118	March 7, 2025

Inference Providers: 3 cents per request?

Related topics