Meta-llama / Meta-Llama-3-70B-Instruct is not available as a serverless API

lenadan · September 26, 2024, 8:26am

Hi,
The serverless API documentation states that llama3-70B is available via the serverless API inference access to Pro users, however, I get an error whenever I try to use it in my code. My credentials are OK because other models work just fine. Also, in the web page of the model, the inference API also doesn’t work with the following error:
The model meta-llama/Meta-Llama-3-70B-Instruct is too large to be loaded automatically (141GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).

What can be the problem?

John6666 · September 26, 2024, 9:32am

If you try some other Pro-only model, we can tell if it’s a repo problem by whether it works or not. non-Pro models don’t make much sense because they will work even if you make a mistake sending the token. Any other good Pro-only models you could try?

lenadan · September 26, 2024, 9:43am

The problem is only with this specific model. Other Pro-only models work just fine (tried llama3-8B and llama3.1-70B)

John6666 · September 26, 2024, 10:01am

Thank you. If so, it’s definitely a repo configuration, content, or HF bug issue, and opening a Discussion is the fastest way to get the developers and repo admins to notice. Everyone involved in the organization will be notified. It’s the one that turns the icon on the home screen yellow. Everyone will notice immediately, except for those who didn’t read it in the first place.

If it wasn’t their fault, they would have taken it upon themselves to file a complaint with HF to resolve it.

lenadan · September 26, 2024, 10:58am

Thanks! I’ll open a discussion.

AK1 · September 26, 2024, 3:46pm

No, it hasn’t been working since yesterday evening, not even Llama 3.1 70B.

  const response = await hf.chatCompletion({
    model: "meta-llama/Llama-3.1-70B", // Use a valid LLaMA model ID
    // model: "mistralai/Mixtral-8x7B-Instruct-v0.1", // Use a valid LLaMA model ID
    messages: fullConversation,
    max_new_tokens: 15,
    temperature: 0.9,
    
  });

  const llamaMessage = {
    role: "user",
    content: response.choices[0].message.content.trim(),
  };

John6666 · September 26, 2024, 3:54pm

it hasn’t been working since yesterday evening,

That timing…maybe they made some changes to make this new HF feature work…?

AK1 · September 27, 2024, 1:20am

any update??

John6666 · September 27, 2024, 1:39am

They released this just in time. This one includes Llama-3.1-70B. Maybe something happened in the coordination for this, maybe it’s unrelated.

But still, so far I haven’t seen any response from the developer on Discussion.

nielsr · September 27, 2024, 6:31am

Thanks for reporting, have pinged the team.

osanseviero · September 28, 2024, 4:27pm

Hi all! This should be fixed now. Sorry for the inconvenience.

Topic		Replies	Views
Deployment of Meta Llama 3.1 70B Instruct Models	1	68	January 10, 2025
Unable to access Llama3.1 model despite having access granted Models	1	452	September 9, 2024
Are InferenceClient()'s down? Beginners	10	268	July 3, 2025
meta-llama/Llama-3.2-11B-Vision-Instruct did not reply 🤗Transformers	10	12908	October 29, 2024
API access no longer working despite Pro subscription 🤗Hub	6	767	April 12, 2024

Meta-llama / Meta-Llama-3-70B-Instruct is not available as a serverless API

Related topics