API inference limit changed?

I have been using the API text-to-image inference for a few months, always at the same usage level and pretty much the same model (prithivMLmods/Canopus-LoRA-Flux-UltraRealism-2.0), and never met my monthly max…but now after 4 days I got the 402 error saying that I already reached my max allowed…

“You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly allowance.”

did anything change in terms of the limits for free tiers??!

2 Likes

Maybe related:

ok so if I understand correctly, this should soon-ish be fixed hopefully for the better?

1 Like

I hope so.

Hi,

I had the same experience. I had been using LLaMa-3.3-70B for several months through a PRO subscription. I compare summarization results on news stories (40-70 stories/day, 700-2,500 tokens) for different models/APIs each day, GPT-40, Gemini, LLaMa-3.3-70B, etc.

When I got rate-limited, I opened a second acount to see what the “shadow charge” on PRO users was. Over two days, I used the $2 credit after around 80 stories. The equivalent charge from OpenAI was ~$0.40.

Is the coming PRO “pay as you go” likely to be that high?

Thanks, -Charlie Dolan

PS Really miss using LLaMa-3.3-70B because it was very often right on the mark summariizing long discursive news analysis and blog posts when GPT-4o, Sonnet 3.5 and Gemini 2.0 all wiffed.

1 Like

If you need to use Llama asap try Groq, they have a free tier you can use via API while they fix HF API issues…which hopefully will be soon :crossed_fingers:

2 Likes

Hi, I just noted the start of “pay as you go” for PRO. (a) Thanks for the quick implementation. The time to update the total due on the billing page is super quick–just as fast as OpenAI and faster than Google or Anthropic. Cudos. (b) It looks like the cost is pretty close to what I saw before, so less competitive than commercial (5-8x), but still reasonable given HuggingFace is non-profit and provides a high quality environment for models that are hard to access otherwise. (c) With this pricing, I can continue to use my PRO subscription to access LLaMa-3.3-70B (or other open source), not as a primary summarization tool, but I can use it as a check when one of the hyper-scalers wiffs on a new summary. Probably closer to what PRO is meant for ;->. (d) At some point, I will try to use an Inference Endpoint on Hugging Face to get a per-story cost for just compute (after one pays to spin up the instance). (e) Thanks again!

1 Like

Hi, I finally got a chance to intergrate the Inference API working on the machines where I read new summaries, and I noticed the following behaviors: (1) I was require to re-request authorization for LLaMa-3.3-70B; (2) after it came through as ACCEPTED on the “Gated Repos Status” page, it took a few hours to flow throught to the Inference API, and (3) it does not seem to charge anything when I look for updates on the Billing page under inference usage, i.e., it still shows $2.74 dues balance that I generated on my last experiement.

Is this new new behaviour or will I see the charges eventually?

Thanks, -Charlie

PS I still have not tried an Inference Endpoint for a large number of summaries with LLaMa-3.3-70B.

1 Like