Difference between pinned models and Inference endpoints

I am looking to deploy my model online and it seems like I have 2 options:

  1. Pinned model
  2. Inference Endpoint

What’s the difference between these two?

Hi @popaqy, here is a very high-level overview:

  1. Pinned models (just a model preloaded for inference) are available through the Inference API, but it is only supported and available to existing paying customers. Otherwise, Inference API is a free product :slight_smile:
  2. Inference Endpoint is like the next iteration of pinned models, and it’ll build and deploy your model on its own secure Endpoint with cool autoscaling and security features. You can also choose your own CPU/GPU depending on your needs to keep costs low. Check this out if you need a production-ready environment for your model!
2 Likes

Thank you a lot for the answer.

I have a follow-up question:

For Inference API it is said that the model is run on Intel Ice Lake CPU but the instance is not explicitly mentioned. Can you tell me which of the following instances does the Inference API use?

That part is entirely up to you! :slight_smile: You can pick a smaller instance if you don’t anticipate needing a lot of compute or you can go for one of the larger instances if you need something more powerful.

If you’re interested, check out the Pricing docs to learn how costs are calculated for these resources.