HF Playground Incorrect Billing -

Hello All, I was testing the HF playground and all my requests were only $0.20, I was testing in the window on the model page now my total is $9.08 (Model is Qwen/Qwen3-235B-A22B) Where can I find the HF Inference pricing and why is it so high? I got at best 10k tokens for price of Millions

1 Like

It seems that the criteria have changed. In other words, when using large models, the cost per request becomes expensive.

Starting in March, usage now takes into account compute time x price of the hardware

It sounds like the pricing jumped unexpectedly! Hugging Face’s inference costs can vary based on the model’s size, provider, and token usage. The Qwen/Qwen3-235B-A22B model is a Mixture-of-Experts (MoE) model with 235 billion parameters, which means it can be significantly more expensive than smaller models43dcd9a7-70db-4a1f-b0ae-981daa162054.

Where to Find Pricing Details

You can check Hugging Face’s official inference pricing on their model page or explore detailed cost breakdowns on LLM Stats.

Why the Cost Might Be High

  1. MoE Architecture – This model activates 22 billion parameters per request, meaning it consumes more compute resources.
  2. Token Pricing – Some models charge per million tokens, and if the pricing structure isn’t clear, it can lead to unexpected costs.
  3. Inference Provider Differences – Different providers may have varying rates, so switching providers could help reduce costs.
  4. Hidden Overhead – Some models require additional processing beyond just token generation, increasing the total price.

Next Steps

  • Check the pricing breakdown on Hugging Face’s documentation.
  • Compare providers to see if a different one offers lower rates.
  • Limit token usage by adjusting your request length.

If you need help optimizing your usage, I can suggest ways to reduce token consumption! :rocket: