We’ve pinned models (both using the API call and using the dashboard at Dashboard - Hosted API - HuggingFace), but still get “currently loading” errors when we try to make inference API calls.
One example: model https://huggingface.co/redwoodresearch/redwood_deberta-v3-sift_82b19d290a74410caa804fa47e94a80b (private but we can make it public if that would help). It’s currently (supposedly) pinned but still requires a minute of warmup after a period of inactivity.
Let me know if there’s anything we should do differently!
@Narsil Seems like you’ve worked on pinned models before - any chance you could take a look?
Were you able to solve this problem?
hey same problem. pinned model always loads up when at rest for some time. any solutions here?
I have the same problem when pinning the model allenai/tk-instruct-11b-def, never even loaded. However, it is not working in the model’s page at hugging face which may suggest it is a problem on HuggingFace’s side.
On the other hand, I tried pinning a smaller model (the 3B) version and it worked like a charm
Actually pinning works. It’s just that this model is too big to be loaded by default.
What is actually failing is the detection that this model is too big to be loaded by the machines we’re using. In order to run these large models we need to discuss it (Since we need different hardware than the standard one).
Hope that answers your questions.