Is it possible to run and stop an endpoint using code/API to avoid be billed when it is not used?

I want to use the HF inference endpoint for my project but, since the model will be used just few hours per day I want to launch the endpoint and stop it within the same day. Is it possible?

hi @smartinezbragado , currently it’s no possible, but it’s on a short term milestone to implement it.

1 Like

Happy to let you know we’ve just made this possible!: Pause and Resume your Endpoint
You can Pause/Resume as often as you’d like to only be billed while you need the model.


Thanks @radames and @ronvolutional . I am reading the documentations of the API and I did not find anything to pause and resume the endpoint (only downscale it to 0). Is it possible to pause/resume it through API or only manually?

Thanks in advance :slight_smile:

just copying and pasting @philschmid response from discord here

curl --request PUT \
 --url \
 --header 'Authorization: Bearer TOKEN' \
 --header 'Content-Type: application/json' \
 --data '{
 "compute": {
  "scaling": {
   "minReplica": 0,
   "maxReplica": 0
1 Like

This does not work anymore. I get a 400 error.

The dashboard get the exact same error when clicking on the “Stop endpoint” button.
I guess it’s linked to the new “Automatic Scale-to-Zero” option.

Any idea on how to pause/resume endpoints now?

Hello, we’ve indeed changed the way you pause/stop an endpoint. There are two new routes, /pause and /resume.

You can check the swagger docs here, let me know if there is anything else I can do to help.