Inference Api ( serverless ) Endpoint

It seems that the Serversless API of the inference system has encountered an issue with loading large models, stating that the model is too large. To address this, I attempted to quantize the model using the bitsandbytes package and then uploaded it. The inference endpoint is now visible; however, when I attempt to interact with it, I receive the following error message: “No package metadata was found for bitsandbytes.”

Does this error message indicate that the endpoint is non-functional with the quantized model, or is there a way to install the bitsandbytes package within the Hugging Face infrastructure? I am seeking clarification on this matter.

1 Like