Hello, I am having the following two issues. I cannot run large models using the inference API. For example, if I run the following, import requests API_URL = "https://api-inference.huggingface.co/models/EleutherAI/gpt-neox-20b" headers = {"Authorization": "Bearer <MY_API_KEY_HERE>"} def query…

Cannot run large models using API token

radames February 22, 2024, 11:26pm 6

Hi @mandelakori, Zephyr is an LLM developed by our team, for which we’ve manually enabled inference. For other large models, we currently recommend using Inference Endpoints.

Topic		Replies	Views
Inference service for large models, such as Vicuna 13b Beginners	0	1427	May 5, 2023
PRO Plan and for running huge models on free inference api? Beginners	1	1806	May 15, 2023
Inference API stopped working for my model 🤗Hub	11	5379	April 26, 2023
The model mistralai/Mistral-7B-Instruct-v0.1 is too large to be loaded automatically (14GB > 10GB) Models	2	187	April 15, 2025
Inference API stopped working Inference Endpoints on the Hub	50	4649	June 8, 2025

Cannot run large models using API token

Related topics