What should I do when I meet rate limit?

{‘error’: [{‘message’: ‘update vector: failed with status: 429 error: Rate limit reached. You reached PRO hourly usage limit. Use Inference Endpoints (dedicated) to scale your endpoint.’}]}

What should I do when I meet rate limit reached?

Did you find any response?

I solved this problem long ago. thanks

Hi @Culture-and-Morality Thanks for posting! The free Inference API is a solution to easily explore and evaluate models, and Inference Endpoints is our new paid inference solution for production use cases. So for larger volumes of requests, or if you need guaranteed latency/performance, we recommend using Inference Endpoints instead to easily deploy your models on dedicated, fully-managed infrastructure. Inference Endpoints will give you the flexibility to quickly create endpoints on CPU or GPU resources, and is billed by compute uptime vs character usage. Further pricing information can be found here.

Please let me know if you have additional questions. :hugs: