Help using inference endpoint with Llama 3.1 405B Instruct

Around6827 · August 28, 2024, 2:19pm

Trying to run an example from: https://huggingface.co/blog/llama31#inference-integrations

Works on smaller models, but for 405B the client freezes (waited about 30 min).

Did someone successfully queried this model? (I have a PRO account on HF)

Update: now I got this

Exception has occurred: HfHubHTTPError

429 Client Error: Too Many Requests for url: link) Rate limit reached. You reached PRO hourly usage limit. Use Inference Endpoints (dedicated) to scale your endpoint.

requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8/v1/chat/completions

dan7cor · August 30, 2024, 9:49pm

I have been trying to query the model, but I have been getting the same error:
Error code: 503 - {‘error’: ‘Service Unavailable’}. The same code works fine for llama 3.1 8B and 70B. I have access to the models and also pro account on HF

Topic		Replies	Views
Constant 503 error for several days when running LLAMA 3.1 Inference Endpoints on the Hub	5	329	April 25, 2025
Cannot use Inference Provider. 429 error. First time usage Inference Endpoints on the Hub	6	64	May 5, 2025
LLAMA2 70b Inference api stuck on currently loading Inference Endpoints on the Hub	4	1038	September 3, 2024
Unable to access Llama3.1 model despite having access granted Models	1	460	September 9, 2024
Are InferenceClient()'s down? Beginners	10	294	July 3, 2025

Help using inference endpoint with Llama 3.1 405B Instruct

Related topics