I have a pinned model (shaxpir/prosecraft_resumed_ft2
) on the inference API that has been working well for over a year, but it recently stopped working… When I make a request like this:
curl -i -X POST https://api-inference.huggingface.co/models/shaxpir/prosecraft_resumed_ft2 \
-H "Authorization: Bearer <REDACTED>" \
-H "Content-Type: application/json" \
-d \
'{
"inputs":"Once upon a time,",
"options":{
"use_gpu": true,
"use_cache": false
},
"parameters": {
"return_full_text": false,
"num_return_sequences": 1,
"temperature": 1.0,
"top_p" : 0.9,
"max_new_tokens": 250
}
}'
I get a 503 error telling me that the model is currently loading…
HTTP/2 503
date: Mon, 24 Apr 2023 19:17:42 GMT
content-type: application/json
content-length: 91
x-request-id: JcowSHvjgHSgCiowln3Zm
access-control-allow-credentials: true
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
{
"error" : "Model shaxpir/prosecraft_resumed_ft2 is currently loading",
"estimated_time" : 20.0
}
But the model never seems to fully load, and the “estimated_time” never changes.
Can you help?