I am looking to deploy my model online and it seems like I have 2 options:
- Pinned model
- Inference Endpoint
What’s the difference between these two?
I am looking to deploy my model online and it seems like I have 2 options:
What’s the difference between these two?
Hi @popaqy, here is a very high-level overview:
Thank you a lot for the answer.
I have a follow-up question:
For Inference API it is said that the model is run on Intel Ice Lake CPU but the instance is not explicitly mentioned. Can you tell me which of the following instances does the Inference API use?
That part is entirely up to you! You can pick a smaller instance if you don’t anticipate needing a lot of compute or you can go for one of the larger instances if you need something more powerful.
If you’re interested, check out the Pricing docs to learn how costs are calculated for these resources.