We’ve pinned models (both using the API call and using the dashboard at Dashboard - Hosted API - HuggingFace), but still get “currently loading” errors when we try to make inference API calls.
I have the same problem when pinning the model allenai/tk-instruct-11b-def, never even loaded. However, it is not working in the model’s page at hugging face which may suggest it is a problem on HuggingFace’s side.
On the other hand, I tried pinning a smaller model (the 3B) version and it worked like a charm
Actually pinning works. It’s just that this model is too big to be loaded by default.
What is actually failing is the detection that this model is too big to be loaded by the machines we’re using. In order to run these large models we need to discuss it (Since we need different hardware than the standard one).