Cannot run large models using API token

radames · February 16, 2023, 7:34pm

hi @AndreaSottana , that is a very large model, it takes a long time to load on our API inference.
Our API inference is suitable for testing and evaluation. If you’re looking for less latency you probably need our dedicated service Inference Endpoints

You can read more about how the hub inference API works here

Topic		Replies	Views
Inference service for large models, such as Vicuna 13b Beginners	0	1428	May 5, 2023
PRO Plan and for running huge models on free inference api? Beginners	1	1806	May 15, 2023
Inference API stopped working for my model 🤗Hub	11	5380	April 26, 2023
The model mistralai/Mistral-7B-Instruct-v0.1 is too large to be loaded automatically (14GB > 10GB) Models	2	188	April 15, 2025
Inference API stopped working Inference Endpoints on the Hub	50	4739	June 8, 2025

Cannot run large models using API token

Related topics