Inference Providers: 3 cents per request?

Basically this is the question: the posts about using the Inference Providers seem to indicate that the price is the same as using the providers directly. But when I run a small sample with one word “Hello” it seems that $0.03 is deducted from my account. The provider’s pricing for this model says “$0.10 / M tokens” which is a far cry from $0.03 per a couple of tokens to send and receive “hello”.
Also the amount does not seem to depend on the actual amount of tokens: if I attach a 1Mb image I still get charged the same 3 cents.

So the question is if this is a bug in billing or if there is a charge of $0.03 per request? The 0.03/request seems to add up pretty fast…

This seems to be the case for all the providers I tried. Here’s my code in case I’m doing something wrong:

import os
from huggingface_hub import InferenceClient
import base64


model_name= "Qwen/Qwen2-VL-7B-Instruct"

client = InferenceClient(
    model=model_name,
    #provider="fireworks-ai",
    provider="hyperbolic",
    #provider="nebius",
    api_key=os.environ['HF_TOKEN']
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Hello"""
            }
        ]
    }
]


stream = client.chat.completions.create(
    #model=model_name,
    messages=messages, 
    max_tokens=500,
    #temperature=0.0,
    #stream=False
)


print([x.message.content for x in stream.choices])
1 Like

@meganariley Question about pricing.

1 Like

Hi @wexly,

we’re not billing inference providers usage yet (it’s only free included credits), so we are using imperfect approximation heuristics for some of the providers.

We will be shipping accurate pricing and it will go-live in the next week. I’ll post here when it’s live.

2 Likes

I see, thank you very much, I was excited about this feature but got a bit scared when I saw the numbers :flushed_face: … Sorry for a false alarm, looking forward to use this feature!

1 Like

Updated.

1 Like