How to configure a model for Inference API?

ljhskyso · May 23, 2024, 8:44am

I have been confused about how Inference API configuration works on HuggingFace. I see some larger models like llama-3-70b-instruct has the Inference API supported @ meta-llama/Meta-Llama-3-70B-Instruct · Hugging Face, but some smaller models like Phi-3-medium does not @ microsoft/Phi-3-medium-128k-instruct · Hugging Face . I believe the “Model is too large to load in Inference API (serverless)” message is just a default placeholder for models not configured properly.

Why is that? And, how can I properly set up a model for Inference API access?

I saw Phi-3 does have the pipeline and widget configs setup properly in the model card. Does it require HF team to approve a model for Inference API behind the scene?

Topic		Replies	Views
How to enable Inference API for custom models? Beginners	0	297	June 27, 2024
Inference API stopped working Inference Endpoints on the Hub	50	3778	June 8, 2025
How to use llm model's api? Beginners	2	2829	November 14, 2024
Inference Provider Beginners	1	76	April 3, 2025
Phi-3-mini-128k-instruct not working with pro inference api Inference Endpoints on the Hub	14	2261	August 26, 2024

How to configure a model for Inference API?

Related topics