I want to use the HF inference endpoint for my project but, since the model will be used just few hours per day I want to launch the endpoint and stop it within the same day. Is it possible?
hi @smartinezbragado , currently it’s no possible, but it’s on a short term milestone to implement it.
1 Like
Happy to let you know we’ve just made this possible!: Pause and Resume your Endpoint
You can Pause/Resume as often as you’d like to only be billed while you need the model.
2 Likes
Thanks @radames and @ronvolutional . I am reading the documentations of the API and I did not find anything to pause and resume the endpoint (only downscale it to 0). Is it possible to pause/resume it through API or only manually?
Thanks in advance
just copying and pasting @philschmid response from discord here
curl --request PUT \ --url https://api.endpoints.huggingface.cloud/endpoint/ENPOINT-NAME \ --header 'Authorization: Bearer TOKEN' \ --header 'Content-Type: application/json' \ --data '{ "compute": { "scaling": { "minReplica": 0, "maxReplica": 0 } } }'
1 Like