Difference between pinned models and Inference endpoints

popaqy · November 11, 2022, 8:33am

I am looking to deploy my model online and it seems like I have 2 options:

Pinned model
Inference Endpoint

What’s the difference between these two?

stevhliu · November 14, 2022, 4:05pm

Hi @popaqy, here is a very high-level overview:

Pinned models (just a model preloaded for inference) are available through the Inference API, but it is only supported and available to existing paying customers. Otherwise, Inference API is a free product
Inference Endpoint is like the next iteration of pinned models, and it’ll build and deploy your model on its own secure Endpoint with cool autoscaling and security features. You can also choose your own CPU/GPU depending on your needs to keep costs low. Check this out if you need a production-ready environment for your model!

popaqy · November 17, 2022, 7:35am

Thank you a lot for the answer.

I have a follow-up question:

For Inference API it is said that the model is run on Intel Ice Lake CPU but the instance is not explicitly mentioned. Can you tell me which of the following instances does the Inference API use?

stevhliu · November 17, 2022, 2:52pm

That part is entirely up to you! You can pick a smaller instance if you don’t anticipate needing a lot of compute or you can go for one of the larger instances if you need something more powerful.

If you’re interested, check out the Pricing docs to learn how costs are calculated for these resources.

Topic		Replies	Views
How to pin a model on the Hub? 🤗Hub	1	489	March 28, 2023
Executing pinned inference model Models	1	317	May 4, 2023
Error executing pinned inference model 🤗Hub	18	3798	December 10, 2021
Does a pinned model get automatically updated? Inference Endpoints on the Hub	8	1346	November 8, 2022
Inference turned off for this model? Inference Endpoints on the Hub	1	1662	August 15, 2023

Difference between pinned models and Inference endpoints

Related topics