Hello All, I was testing the HF playground and all my requests were only $0.20, I was testing in the window on the model page now my total is $9.08 (Model is Qwen/Qwen3-235B-A22B) Where can I find the HF Inference pricing and why is it so high? I got at best 10k tokens for price of Millions
It seems that the criteria have changed. In other words, when using large models, the cost per request becomes expensive.
Starting in March, usage now takes into account compute time x price of the hardware
It sounds like the pricing jumped unexpectedly! Hugging Face’s inference costs can vary based on the model’s size, provider, and token usage. The Qwen/Qwen3-235B-A22B model is a Mixture-of-Experts (MoE) model with 235 billion parameters, which means it can be significantly more expensive than smaller models43dcd9a7-70db-4a1f-b0ae-981daa162054.
Where to Find Pricing Details
You can check Hugging Face’s official inference pricing on their model page or explore detailed cost breakdowns on LLM Stats.
Why the Cost Might Be High
- MoE Architecture – This model activates 22 billion parameters per request, meaning it consumes more compute resources.
- Token Pricing – Some models charge per million tokens, and if the pricing structure isn’t clear, it can lead to unexpected costs.
- Inference Provider Differences – Different providers may have varying rates, so switching providers could help reduce costs.
- Hidden Overhead – Some models require additional processing beyond just token generation, increasing the total price.
Next Steps
- Check the pricing breakdown on Hugging Face’s documentation.
- Compare providers to see if a different one offers lower rates.
- Limit token usage by adjusting your request length.
If you need help optimizing your usage, I can suggest ways to reduce token consumption!
Okay, so we are charged per time on HF inference API which means for now the solution is to use the other providers? Also is there a way to disable providers you dont want to use?
Also is there a way to set a spending ceiling for my account?
If I used R1 for the same task it wouldnt have cost this much through replicate for example.
The payment limit is set to $100 by default. (I think it was already there when I first joined HF.)
Changing this should be sufficient for personal use.
Detailed limits for the Inference API can apparently be set for Enterprise subscriptions. Well, if multiple people are using it, it’s more convenient to have separate limits for each service.
Individual on/off settings for Inference Providers can be configured on the settings page.
Edit:
The payment limit is set to $100 by default
Oh… It was wrong…
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.