Inference Api ( serverless ) Endpoint

Jakepaul · April 24, 2024, 5:56am

It seems that the Serversless API of the inference system has encountered an issue with loading large models, stating that the model is too large. To address this, I attempted to quantize the model using the bitsandbytes package and then uploaded it. The inference endpoint is now visible; however, when I attempt to interact with it, I receive the following error message: “No package metadata was found for bitsandbytes.”

Does this error message indicate that the endpoint is non-functional with the quantized model, or is there a way to install the bitsandbytes package within the Hugging Face infrastructure? I am seeking clarification on this matter.

Topic		Replies	Views
Inference API returns Unkown Error 🤗Hub	1	644	November 15, 2021
Model Preview: Turn on model's inference API without model weights Inference Endpoints on the Hub	0	118	May 23, 2024
Help using inference endpoint with Llama 3.1 405B Instruct Inference Endpoints on the Hub	1	166	August 30, 2024
Inference API (serverless) Beginners	0	116	May 29, 2024
Is only inference provider :HF Inference API >> permit API Call succefully for any model with fixed URL pattern <f"https://api-inference.huggingface.co/models/{repo_id}"> Beginners	2	10	July 16, 2025

Inference Api ( serverless ) Endpoint

Related topics