Hi,
The serverless API documentation states that llama3-70B is available via the serverless API inference access to Pro users, however, I get an error whenever I try to use it in my code. My credentials are OK because other models work just fine. Also, in the web page of the model, the inference API also doesn’t work with the following error: The model meta-llama/Meta-Llama-3-70B-Instruct is too large to be loaded automatically (141GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).
If you try some other Pro-only model, we can tell if it’s a repo problem by whether it works or not. non-Pro models don’t make much sense because they will work even if you make a mistake sending the token. Any other good Pro-only models you could try?
Thank you. If so, it’s definitely a repo configuration, content, or HF bug issue, and opening a Discussion is the fastest way to get the developers and repo admins to notice. Everyone involved in the organization will be notified. It’s the one that turns the icon on the home screen yellow. Everyone will notice immediately, except for those who didn’t read it in the first place.
If it wasn’t their fault, they would have taken it upon themselves to file a complaint with HF to resolve it.
No, it hasn’t been working since yesterday evening, not even Llama 3.1 70B.
const response = await hf.chatCompletion({
model: "meta-llama/Llama-3.1-70B", // Use a valid LLaMA model ID
// model: "mistralai/Mixtral-8x7B-Instruct-v0.1", // Use a valid LLaMA model ID
messages: fullConversation,
max_new_tokens: 15,
temperature: 0.9,
});
const llamaMessage = {
role: "user",
content: response.choices[0].message.content.trim(),
};