Hello team,
I’m uaing the inference API to create a simple website, I’m using the pro plan and I want to know how many minutes the API keeps the model in memory before off-loading it?
For example, when I request the model the first time, the inference API puts the model on memory, then after how many minutes it will be off-loaded?
Thanks!