Inference API for fine-tuned model not working: No package metadata was found for bitsandbytes

I have fine-tuned a model using QLoRA (4-bit) and then pushed it to the hub. Then, I tried to test it with inference API, but I get the following error:
No package metadata was found for bitsandbytes

The model:

  • base model is meta-llama/Llama-2-7b-chat-hf
  • fine-tuned on a custom dataset of 50 samples (I am just testing around)
  • I made the model public and can be found here
  • this is the colab notebook I used to train it. Note that, after the QLoRA training, I merged the adapter model with the base model, THEN pushed the result to the hub. So the model in the hub is a transformer model. Specifically, a transformers.models.llama.modeling_llama.LlamaForCausalLM
  • this is a colab notebook that can be used for testing. Note that the test works for the base model meta-llama/Llama-2-7b-chat-hf

My suspect is that the docker container behind the inference API does not know that it needs to install bitsandbytes. Is there a way to “tell it”? maybe a tag in the README?


Getting the same error, have you found anything related to the issue?


Unfortunately not. Currently I just gave up :frowning: But let me know if you find something.

1 Like

yeah, same. I tried including requirements.txt, every pip install in that text file, but then too the same error. Don’t know how to figure it out.


I have the same problem. Might this be an issue on the server side? I hope someone from HF looks into it.

1 Like

facing the same issue here.

Did anyone find a solution for this?

Same Issue

The same here. Any advice?

Hello, I had a similar issue recently.

First, make sure to have a requirements file with the packages listed.
Also, check that you have a torch version compatible with bitsandbytes.

Finally, if working with docker, you can either specify it into the requirements file or directly in the docker file by writing something like:
RUN pip install -U bitsandbytes

You can check locally by building the image with docker run to check how the building goes.

I have the same problem and haven’t found any solutions. Did anyone find any solutions? This is so frustrating

I went through the exact same process (fine-tune, merge, push).
Same problem here and same error using inference API.
I have also tried to deploy an inference endpoint and getting another error.
Calling cuda()is not supported for4-bitor8-bit quantized models.

Not sure what is going on here.

Same issue. Have a requirements.txt file installed as well.

Same issue , same message on inference API (No package metadata was found for bitsandbytes). Tried to add transformers in requirements.txt, not working neither

import torch
import transformers
model_id = “meta-llama/Meta-Llama-3-8B”

nf4_config = transformers.BitsAndBytesConfig(

model_nf4 = transformers.AutoModelForCausalLM.from_pretrained(model_id,

same issue here as well