HF Playground Incorrect Billing -

KwabsHug · May 3, 2025, 12:01pm

Hello All, I was testing the HF playground and all my requests were only $0.20, I was testing in the window on the model page now my total is $9.08 (Model is Qwen/Qwen3-235B-A22B) Where can I find the HF Inference pricing and why is it so high? I got at best 10k tokens for price of Millions

John6666 · May 3, 2025, 11:07pm

It seems that the criteria have changed. In other words, when using large models, the cost per request becomes expensive.

Starting in March, usage now takes into account compute time x price of the hardware

Zelgodiz · May 5, 2025, 4:08am

It sounds like the pricing jumped unexpectedly! Hugging Face’s inference costs can vary based on the model’s size, provider, and token usage. The Qwen/Qwen3-235B-A22B model is a Mixture-of-Experts (MoE) model with 235 billion parameters, which means it can be significantly more expensive than smaller models43dcd9a7-70db-4a1f-b0ae-981daa162054.

Where to Find Pricing Details

You can check Hugging Face’s official inference pricing on their model page or explore detailed cost breakdowns on LLM Stats.

Why the Cost Might Be High

MoE Architecture – This model activates 22 billion parameters per request, meaning it consumes more compute resources.
Token Pricing – Some models charge per million tokens, and if the pricing structure isn’t clear, it can lead to unexpected costs.
Inference Provider Differences – Different providers may have varying rates, so switching providers could help reduce costs.
Hidden Overhead – Some models require additional processing beyond just token generation, increasing the total price.

Next Steps

Check the pricing breakdown on Hugging Face’s documentation.
Compare providers to see if a different one offers lower rates.
Limit token usage by adjusting your request length.

If you need help optimizing your usage, I can suggest ways to reduce token consumption!

KwabsHug · May 5, 2025, 6:26am

Okay, so we are charged per time on HF inference API which means for now the solution is to use the other providers? Also is there a way to disable providers you dont want to use?

Also is there a way to set a spending ceiling for my account?
If I used R1 for the same task it wouldnt have cost this much through replicate for example.

John6666 · May 5, 2025, 7:28am

The payment limit is set to $100 by default. (I think it was already there when I first joined HF.)
Changing this should be sufficient for personal use.

Detailed limits for the Inference API can apparently be set for Enterprise subscriptions. Well, if multiple people are using it, it’s more convenient to have separate limits for each service.

Individual on/off settings for Inference Providers can be configured on the settings page.

Edit:

The payment limit is set to $100 by default

Oh… It was wrong…

system · May 5, 2025, 7:28pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference Providers: 3 cents per request? Beginners	4	335	March 12, 2025
How Can I Understand the Exact Cost of My Inference API Requests? Intermediate	2	133	April 16, 2025
Unexpected 10x Increase in Inference API Costs Beginners	2	120	March 25, 2025
Pro Account $2 inference limit Beginners	8	1016	March 23, 2025
Inference API cost changed for meta-llama-3.3-70b? Inference Endpoints on the Hub	3	204	April 13, 2025

HF Playground Incorrect Billing -

Where to Find Pricing Details

Why the Cost Might Be High

Next Steps

Related topics