Inference API for fine-tuned model not working: No package metadata was found for bitsandbytes

I have fine-tuned a model using QLoRA (4-bit) and then pushed it to the hub. Then, I tried to test it with inference API, but I get the following error:
No package metadata was found for bitsandbytes

The model:

  • base model is meta-llama/Llama-2-7b-chat-hf
  • fine-tuned on a custom dataset of 50 samples (I am just testing around)
  • I made the model public and can be found here
  • this is the colab notebook I used to train it. Note that, after the QLoRA training, I merged the adapter model with the base model, THEN pushed the result to the hub. So the model in the hub is a transformer model. Specifically, a transformers.models.llama.modeling_llama.LlamaForCausalLM
  • this is a colab notebook that can be used for testing. Note that the test works for the base model meta-llama/Llama-2-7b-chat-hf

My suspect is that the docker container behind the inference API does not know that it needs to install bitsandbytes. Is there a way to “tell it”? maybe a tag in the README?

3 Likes

Getting the same error, have you found anything related to the issue?

2 Likes

Unfortunately not. Currently I just gave up :frowning: But let me know if you find something.

1 Like

yeah, same. I tried including requirements.txt, every pip install in that text file, but then too the same error. Don’t know how to figure it out.

2 Likes

I have the same problem. Might this be an issue on the server side? I hope someone from HF looks into it.

1 Like

facing the same issue here.

Did anyone find a solution for this?

Same Issue

The same here. Any advice?

Hello, I had a similar issue recently.

First, make sure to have a requirements file with the packages listed.
Also, check that you have a torch version compatible with bitsandbytes.

Finally, if working with docker, you can either specify it into the requirements file or directly in the docker file by writing something like:
RUN pip install -U bitsandbytes

You can check locally by building the image with docker run to check how the building goes.