AutoModelForCausalLM error with accelerate and bitsandbytes

I Use google colab but connected localy with my computer using jupyter. I have Windows 10, RTX 3070 and no cuda or cudnn because I didn’t succed to make it works :frowning:

Reproduction

!pip install transformers trl accelerate torch bitsandbytes peft datasets -qU
!pip install flash-attn --no-build-isolation

from datasets import load_dataset

instruct_tune_dataset = load_dataset("mosaicml/instruct-v3")

...

model_id = "mistralai/Mixtral-8x7B-v0.1"

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map='auto',
quantization_config=nf4_config,
use_cache=False,
attn_implementation="flash_attention_2"

)

Error message:
ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes

Expected behavior

This code works when using the google colab free T4 but don’t when I run the code localy. Even if I have installed bitsandbytes and accelerate.

Please help me :slight_smile:
Thank you very much!

Hi @altrastorique, bnb quantization requires you to have cuda or cudnn to work. The error that you are facing is probably due to that.