I’ve recently started using the inference API and my dashboard shows that I’ve nearly used up my allowance.
Does this allowance refresh on a monthly basis or will I need to purchase a plan in order to keep using it?
Please clarify.
I’ve recently started using the inference API and my dashboard shows that I’ve nearly used up my allowance.
Does this allowance refresh on a monthly basis or will I need to purchase a plan in order to keep using it?
Please clarify.
cc @michellehbn
Hi @Azuremis ! Thanks for reaching out and happy new year! For larger volumes of requests, or if you need guaranteed latency/performance, you can use our new solution Inference Endpoints to easily deploy your models on dedicated, fully-managed infrastructure. Inference Endpoints will give you the flexibility to quickly create endpoints on CPU or GPU resources, and is billed by compute uptime vs character usage. Further pricing information can be found here. Our PRO subscription will give you higher Inference API rate limits than the free Inference API plan, and the limit allowance is refreshed monthly. Please let us know if there are any other questions! Thanks again!
Thank you for raising @philschmid and @michellehbn for clarifying how the inference endpoint refresh, rates and performance works. My query has been perfectly answered