API inference limit changed?

Hi, I just noted the start of “pay as you go” for PRO. (a) Thanks for the quick implementation. The time to update the total due on the billing page is super quick–just as fast as OpenAI and faster than Google or Anthropic. Cudos. (b) It looks like the cost is pretty close to what I saw before, so less competitive than commercial (5-8x), but still reasonable given HuggingFace is non-profit and provides a high quality environment for models that are hard to access otherwise. (c) With this pricing, I can continue to use my PRO subscription to access LLaMa-3.3-70B (or other open source), not as a primary summarization tool, but I can use it as a check when one of the hyper-scalers wiffs on a new summary. Probably closer to what PRO is meant for ;->. (d) At some point, I will try to use an Inference Endpoint on Hugging Face to get a per-story cost for just compute (after one pays to spin up the instance). (e) Thanks again!

2 Likes