API inference limit changed?

Hi, I finally got a chance to intergrate the Inference API working on the machines where I read new summaries, and I noticed the following behaviors: (1) I was require to re-request authorization for LLaMa-3.3-70B; (2) after it came through as ACCEPTED on the “Gated Repos Status” page, it took a few hours to flow throught to the Inference API, and (3) it does not seem to charge anything when I look for updates on the Billing page under inference usage, i.e., it still shows $2.74 dues balance that I generated on my last experiement.

Is this new new behaviour or will I see the charges eventually?

Thanks, -Charlie

PS I still have not tried an Inference Endpoint for a large number of summaries with LLaMa-3.3-70B.

1 Like