Trying to run an example from: https://huggingface.co/blog/llama31#inference-integrations
Works on smaller models, but for 405B the client freezes (waited about 30 min).
Did someone successfully queried this model? (I have a PRO account on HF)
Update: now I got this
Exception has occurred: HfHubHTTPError
429 Client Error: Too Many Requests for url: link) Rate limit reached. You reached PRO hourly usage limit. Use Inference Endpoints (dedicated) to scale your endpoint.
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8/v1/chat/completions