Basically this is the question: the posts about using the Inference Providers seem to indicate that the price is the same as using the providers directly. But when I run a small sample with one word “Hello” it seems that $0.03 is deducted from my account. The provider’s pricing for this model says “$0.10 / M tokens” which is a far cry from $0.03 per a couple of tokens to send and receive “hello”.
Also the amount does not seem to depend on the actual amount of tokens: if I attach a 1Mb image I still get charged the same 3 cents.
So the question is if this is a bug in billing or if there is a charge of $0.03 per request? The 0.03/request seems to add up pretty fast…
This seems to be the case for all the providers I tried. Here’s my code in case I’m doing something wrong:
import os
from huggingface_hub import InferenceClient
import base64
model_name= "Qwen/Qwen2-VL-7B-Instruct"
client = InferenceClient(
model=model_name,
#provider="fireworks-ai",
provider="hyperbolic",
#provider="nebius",
api_key=os.environ['HF_TOKEN']
)
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": """Hello"""
}
]
}
]
stream = client.chat.completions.create(
#model=model_name,
messages=messages,
max_tokens=500,
#temperature=0.0,
#stream=False
)
print([x.message.content for x in stream.choices])