Pinned model still needs to load

Hello,
I have a model pinned. After a short amount of idle time the inference API still needs to load the model, i.e. it returns the message ‘Model <username>/<model_name> is currently loading’. This is not supposed to happen, right? As I understand it, this is the whole purpose of pinning models.

I have confirmed it is indeed pinned through the code:

    request_headers = {
                      'Authorization': 'Bearer {}'.format(<huggingface_token>)
                      }
    pin_url = "https://api-inference.huggingface.co/usage/pinned_models"
    response = requests.get(pin_url, headers=request_headers)

The model is called through the following code:

   api_endpoint = 'https://api-inference.huggingface.co/models/<username>/<model_name>'
   data = json.dumps(payload)
   response = requests.request('POST',
                                api_endpoint,
                                headers=request_headers,
                                data=data)

I feel like I have followed everything in the documentation and don’t understand why it isn’t working.

Thank you in advance for any answers!

2 Likes

We’re encountering the same issue.

1 Like

still not solved