Hi everyone!
For the past few weeks i’ve been finetuning a model, realized it was too slow at inference, so decided to go to “gguf-my-repo“ and quantize it to 4bit GGUF.
I have previously deployed models on Inference endpoints but never a GGUF model.
When spinning up the server, i’ve left most of the configurations ‘default’.
The server spins properly “Running“. To test the endpoint i make a POST on Postman and get the following error:
{“error”: {“code”: 404, “message”: “File Not Found”, “type”: “not_found_error” }}
The error does not refer to the model file not being found as i can see it has been correctly identified on the UI of Inference Endpoints.
I’ve even tried changing the URL destination on Postman to ‘/chat/completions‘ but the error persists…
I’m lost, any help?