Kimi 2.5 is not free anymore?

Hello all,

Could you please help me identify/learn how to use the LLM models (open source) like Kimi 2.5 via API key and know whether it is free (within fair usage limits) or not? I was using this for few days and all of a sudden, received an email from Hugging face that charges are accruing. How do I know if a model is paid or free to use? Especially kimi 2.5 Any help is much appreciated. Thanks,

1 Like

I think the amount of free inference available per month is quite limited.


Is Kimi 2.5 “not free anymore” on Hugging Face?

On Hugging Face, Kimi-K2.5 is open-source (weights), but hosted API inference is a paid compute service.

  • Hugging Face model pages offer pay-as-you-go inference (widgets/playground/API), “powered by Inference Providers” and it “includes a free-tier.” (Hugging Face)
  • The free-tier is small monthly credits, not unlimited “fair use.” Hugging Face documents: Free $0.10/month, PRO $2/month, and PRO/Team/Enterprise can continue as pay-as-you-go after credits. (Hugging Face)
  • For moonshotai/Kimi-K2.5, the supported-models pricing table shows it is priced via providers (example entries): Together $0.50/1M input + $2.80/1M output, Novita $0.60/1M input + $3.00/1M output. (Hugging Face)

So: it likely didn’t “become” paid—your usage likely moved from credits to billable usage.


Why you got an email saying “charges are accruing”

Two common reasons:

  1. You exhausted credits and your account can do PAYG
    Hugging Face explicitly says PRO and Team/Enterprise orgs can keep using the API after credits are exhausted. (Hugging Face)

  2. You have a payment method and you crossed a billing threshold
    Hugging Face bills credit cards when accrued compute usage exceeds a threshold (and also invoices monthly). (Hugging Face)

If you are on the Free plan, extra usage is documented as “no” (so you’d usually hit errors instead of accruing charges). An “accruing charges” email often implies PRO and/or org billing and/or a card on file. (Hugging Face)


How to tell if a model is free vs paid on HF (fast checklist)

Step 1 — Are you using hosted inference or just downloading weights?

  • If you clicked “Try it” / Playground or called the model via API, that’s hosted inference (metered). (Hugging Face)
  • Downloading weights to run locally is different (no HF inference charges).

Step 2 — Check the Inference Providers supported-models pricing table

Go to HF’s Inference Providers → Supported Models list and search the model:

  • If you see Input /1M** and **Output/1M values that are non-zero, it’s paid (credits may offset a tiny amount).
  • For Kimi-K2.5, prices are listed (Together/Novita). (Hugging Face)

Step 3 — Know what “free-tier” means on HF

HF’s definition of “free-tier” here is monthly credits:

  • Free: $0.10/month
  • PRO: $2/month
  • PRO/Team/Enterprise: PAYG after credits (Hugging Face)

Step 4 — Be careful with “Custom Provider Key”

HF offers two billing modes:

  • Routed by Hugging Face: credits apply; HF bills usage.
  • Custom Provider Key: credits do not apply; the provider bills you directly. (Hugging Face)

How to use Kimi-K2.5 via API key on Hugging Face

Hugging Face Inference Providers gives you an OpenAI-compatible endpoint. (Hugging Face)

Option A) OpenAI SDK → HF Router (recommended)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

resp = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5:cheapest",  # or ":together", ":novita", ":fastest"
    messages=[{"role": "user", "content": "Explain gradient descent simply."}],
    max_tokens=300,
)

print(resp.choices[0].message.content)

HF documents:

  • router base URL https://router.huggingface.co/v1 (Hugging Face)
  • provider selection via suffix: :fastest, :cheapest, or :<provider> (Hugging Face)

Option B) huggingface_hub InferenceClient

import os
from huggingface_hub import InferenceClient

client = InferenceClient(api_key=os.environ["HF_TOKEN"])

resp = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5:together",
    messages=[{"role": "user", "content": "Write a short Python prime checker."}],
    max_tokens=300,
)

print(resp.choices[0].message.content)

Practical tips to avoid surprise charges (especially with Kimi-K2.5)

  1. Always set max_tokens (output tokens are usually the expensive part).
  2. Prefer :cheapest while prototyping, but verify what provider it selects. (Hugging Face)
  3. Watch for huge prompts: Kimi-K2.5 has very large context (easy to accidentally send long logs/docs). (Hugging Face)
  4. Check the billing dashboard when you start a new run—HF can bill as you cross thresholds. (Hugging Face)

What I think happened in your case (most likely)

You were making hosted calls to moonshotai/Kimi-K2.5 through Inference Providers (widget/playground/router/SDK). It appeared “free” for a few days because you were within monthly credits, then your account started PAYG billing (PRO/org + payment method), prompting the “charges accruing” email. (Hugging Face)