How to configure a model for Inference API?

I have been confused about how Inference API configuration works on HuggingFace. I see some larger models like llama-3-70b-instruct has the Inference API supported @ meta-llama/Meta-Llama-3-70B-Instruct · Hugging Face, but some smaller models like Phi-3-medium does not @ microsoft/Phi-3-medium-128k-instruct · Hugging Face . I believe the “Model is too large to load in Inference API (serverless)” message is just a default placeholder for models not configured properly.

Why is that? And, how can I properly set up a model for Inference API access?

I saw Phi-3 does have the pipeline and widget configs setup properly in the model card. Does it require HF team to approve a model for Inference API behind the scene?