From Hugging Face Discord:
Tom Aarsen
Hello! I believe some of the inference endpoints currently have “Scale to zero” enabled temporarily, meaning they will go down when there’s no usage for a while. The first request will then be slow/fail, but subsequent ones will work. We’re going to remove the scale to zero again so that this is not an issue anymore, apologies for the inconvenience. cc @ VB can you update the scale to zero for the big ST models that already had APIs?