Hello,
I have a model pinned. After a short amount of idle time the inference API still needs to load the model, i.e. it returns the message ‘Model <username>/<model_name> is currently loading’. This is not supposed to happen, right? As I understand it, this is the whole purpose of pinning models.
I have confirmed it is indeed pinned through the code:
request_headers = {
'Authorization': 'Bearer {}'.format(<huggingface_token>)
}
pin_url = "https://api-inference.huggingface.co/usage/pinned_models"
response = requests.get(pin_url, headers=request_headers)
The model is called through the following code:
api_endpoint = 'https://api-inference.huggingface.co/models/<username>/<model_name>'
data = json.dumps(payload)
response = requests.request('POST',
api_endpoint,
headers=request_headers,
data=data)
I feel like I have followed everything in the documentation and don’t understand why it isn’t working.
Thank you in advance for any answers!