Cannot use Inference Provider. 429 error. First time usage

huytofu92 · April 29, 2025, 9:40am

RuntimeError: Failed to generate response from together API: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/meta-llama/Meta-Llama-3.1-8B-Instruct?expand=inferenceProviderMapping

Tried using Inference Provider (Together AI as provider) via InferenceClient
First time usage. I have never used this service before
Get the above error

I have paid for Pro Plan ($9/month)
I have also obtained access for Meta’s llama models which require permission
I have also added headers = {“X-wait-for-model”: “true”} when initiating the InferenceClient()

Yet I still can’t use at all?!?! Where does this rate limit come from?
What is even the point of paying for pro plan if I can’t use these models at all??

John6666 · April 29, 2025, 10:48am

I don’t know if it’s a problem with the Inference Provider or a lingering effect from the previous server malfunction…

Either way, it’s a problem with the paid service… @meganariley @michellehbn

John6666 · April 29, 2025, 10:52am

Possibly this problem.

huytofu92 · April 30, 2025, 4:13am

@John6666 Are you part of HuggingFace team?

I went to check Together AI’s page for their list of models and realized that their version of Lllama3.1 8B is meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Is this the reason why my InferenceClient cannot pull model info from huggingface using this link https://huggingface.co/api/models/meta-llama/Llama-3.1-8B-Instruct?expand=inferenceProviderMapping?

I have also checked the json object returned from this link. Turns out there is no Together AI among the providers. Is this the root cause for 429 error?

John6666 · April 30, 2025, 6:57am

Are you part of HuggingFace team?

No.

John6666 · April 30, 2025, 7:04am

Including HF Discord, forums and hubs are basically based on mutual assistance between users, and it is common for IT companies to have staff who are sometimes present and sometimes absent. Whether that is a good thing or not is another question…

Is this the reason why

Even if the names are different, it seems that the assignments themselves have been made. Perhaps only that provider has not been assigned yet…?

Zelgodiz · May 5, 2025, 4:00am

If you want to bypass the rate limit issue when using Hugging Face’s InferenceClient with Together AI, you can try forcing a delay between requests or switching to a different provider programmatically. Here’s a workaround using Python:

import time
from huggingface_hub import InferenceClient

# Define model and headers
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
headers = {"X-wait-for-model": "true"}  

# Function to retry requests with a forced delay
def robust_inference(prompt, retries=3, delay=10):
    client = InferenceClient(model_id, headers=headers)
    
    for attempt in range(retries):
        try:
            response = client.query(prompt)
            return response  # Success
        except Exception as e:
            print(f"Attempt {attempt+1} failed: {e}")
            if "429" in str(e):  # Too Many Requests error
                print(f"Rate limit hit, waiting {delay} seconds before retrying...")
                time.sleep(delay)
            else:
                break  # Other errors should not retry
    
    return "Inference failed after multiple attempts."

# Example usage
response = robust_inference("Explain quantum physics in simple terms.")
print(response)

Explanation:

Retries on failure – If it hits a 429 error, it waits and retries instead of failing outright.
Forces a delay – Introduces a forced wait time (delay=10 seconds) between attempts.
Allows switching providers – You could modify model_id to test against Fireworks AI, Nebius, or another provider.

This may not completely override by Hugging Face or Together AI, but it ensures you’re handling the restriction more gracefully.

Let me know if you need tweaks or a different approach!

Topic		Replies	Views
Help using inference endpoint with Llama 3.1 405B Instruct Inference Endpoints on the Hub	1	181	August 30, 2024
Need help for Infernece API rate limiting Beginners	0	317	May 26, 2024
Hugging Face Payment Error 402 & You've Exceeded Monthly Quota Languages at Hugging Face	22	1927	June 1, 2025
Facing Rate Limit issues on the inference API Beginners	1	5748	June 14, 2024
Pro Account $2 inference limit Beginners	8	1326	March 23, 2025

Cannot use Inference Provider. 429 error. First time usage

Explanation:

Related topics