How to load StarCoder2 quantized to 4bits?

Hi folks,

I’m trying to play around with the new StarCoder2 model. I got it working with 16bit, but I’m hoping to use this on a machine with only a few GB of memory available, so I’m trying to play with 4bit quantization. I’m just following these instructions.

However, I get an import error whenever I try to load the model:

>>> from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
>>> qc = BitsAndBytesConfig(load_in_4bit=True)
>>> checkpoint = "bigcode/starcoder2-3b"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=qc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/james/git/starcoder2/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/home/james/git/starcoder2/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3029, in from_pretrained
    hf_quantizer.validate_environment(
  File "/home/james/git/starcoder2/venv/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise ImportError(
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

bitsandbytes and accelerate are both installed, since I used the repo’s requirements.txt. Any idea what could be the issue here?

I’m using Ubuntu 22.04 with an AMD Ryzen 5 2400G (no dedicated GPU in this machine, just trying to run on CPU).

Thanks for any help you can provide!

Hello there, I am also experiencing the same issue. Have you come up with a solution?