Pinned model still needs to load

I have a model pinned. After a short amount of idle time the inference API still needs to load the model, i.e. it returns the message ‘Model <username>/<model_name> is currently loading’. This is not supposed to happen, right? As I understand it, this is the whole purpose of pinning models.

I have confirmed it is indeed pinned through the code:

    request_headers = {
                      'Authorization': 'Bearer {}'.format(<huggingface_token>)
    pin_url = ""
    response = requests.get(pin_url, headers=request_headers)

The model is called through the following code:

   api_endpoint = '<username>/<model_name>'
   data = json.dumps(payload)
   response = requests.request('POST',

I feel like I have followed everything in the documentation and don’t understand why it isn’t working.

Thank you in advance for any answers!

1 Like

We’re encountering the same issue.