Misunderstanding about inference endpoint billing

Hi everyone !

I managed to deploy a model and linked it to my infrastructure. During tests, the model has been inferred with only 2 images. Response time was about 10-15s each but I have been billed for 1 minute 40 and more (had to pause the endpoint).
The documentation states exactly : "
Pay for compute resources uptime by the minute, billed monthly.

As low as $0.06 per CPU core/hr and $0.6 per GPU/hr.
"
which led me to assume that as long as the endpoint is not being used, I am not being billed. Am I wrong or did I miss something ?
Good evening :slight_smile: