API inference limit changed?

SebBat · March 5, 2025, 5:54am

I have been using the API text-to-image inference for a few months, always at the same usage level and pretty much the same model (prithivMLmods/Canopus-LoRA-Flux-UltraRealism-2.0), and never met my monthly max…but now after 4 days I got the 402 error saying that I already reached my max allowed…

“You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly allowance.”

did anything change in terms of the limits for free tiers??!

John6666 · March 5, 2025, 6:47am

Maybe related:

SebBat · March 5, 2025, 8:05am

ok so if I understand correctly, this should soon-ish be fixed hopefully for the better?

John6666 · March 5, 2025, 8:06am

I hope so.

charlespdolan · March 6, 2025, 1:27pm

Hi,

I had the same experience. I had been using LLaMa-3.3-70B for several months through a PRO subscription. I compare summarization results on news stories (40-70 stories/day, 700-2,500 tokens) for different models/APIs each day, GPT-40, Gemini, LLaMa-3.3-70B, etc.

When I got rate-limited, I opened a second acount to see what the “shadow charge” on PRO users was. Over two days, I used the $2 credit after around 80 stories. The equivalent charge from OpenAI was ~$0.40.

Is the coming PRO “pay as you go” likely to be that high?

Thanks, -Charlie Dolan

PS Really miss using LLaMa-3.3-70B because it was very often right on the mark summariizing long discursive news analysis and blog posts when GPT-4o, Sonnet 3.5 and Gemini 2.0 all wiffed.

SebBat · March 6, 2025, 1:57pm

If you need to use Llama asap try Groq, they have a free tier you can use via API while they fix HF API issues…which hopefully will be soon

charlespdolan · March 12, 2025, 1:00pm

Hi, I just noted the start of “pay as you go” for PRO. (a) Thanks for the quick implementation. The time to update the total due on the billing page is super quick–just as fast as OpenAI and faster than Google or Anthropic. Cudos. (b) It looks like the cost is pretty close to what I saw before, so less competitive than commercial (5-8x), but still reasonable given HuggingFace is non-profit and provides a high quality environment for models that are hard to access otherwise. (c) With this pricing, I can continue to use my PRO subscription to access LLaMa-3.3-70B (or other open source), not as a primary summarization tool, but I can use it as a check when one of the hyper-scalers wiffs on a new summary. Probably closer to what PRO is meant for ;->. (d) At some point, I will try to use an Inference Endpoint on Hugging Face to get a per-story cost for just compute (after one pays to spin up the instance). (e) Thanks again!

charlespdolan · March 24, 2025, 12:19pm

Hi, I finally got a chance to intergrate the Inference API working on the machines where I read new summaries, and I noticed the following behaviors: (1) I was require to re-request authorization for LLaMa-3.3-70B; (2) after it came through as ACCEPTED on the “Gated Repos Status” page, it took a few hours to flow throught to the Inference API, and (3) it does not seem to charge anything when I look for updates on the Billing page under inference usage, i.e., it still shows $2.74 dues balance that I generated on my last experiement.

Is this new new behaviour or will I see the charges eventually?

Thanks, -Charlie

PS I still have not tried an Inference Endpoint for a large number of summaries with LLaMa-3.3-70B.

Angelina067 · July 14, 2025, 7:01am

I think there might have been recent changes to Hugging Face’s free tier limits for inference API. One of my student been using the same model and usage level consistently, so this sudden limit seems like a policy update on their end.
Check and read Hugging Face’s recent announcements or documentation.
Look it might show a billing section in your account dashboard.

If it’s urgent, the PRO subscription might be worth it temporarily until you figure out the details.

JasonNan · July 21, 2025, 6:02pm

I think there might have been recent changes to Hugging Face’s free tier limits for inference API. One of my student been using the same model and usage level consistently, so this sudden limit seems like a policy update on their end.

Definitely this is the reason. Other suggestions do not make any sense and are beside the point. Huggingface lacks transparency about these matters. It’s a shame.

Topic		Replies	Views
API Limit in PRO Beginners	2	174	March 5, 2025
Pro Account $2 inference limit Beginners	8	1404	March 23, 2025
Unlimited API usage for models Beginners	4	5732	May 7, 2021
Need help for Infernece API rate limiting Beginners	0	320	May 26, 2024
Question about Hugging face inference API Beginners	1	1946	May 6, 2024

API inference limit changed?

Related topics