BitsAndBytes transformers issue

I’m trying to run my model with:

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            )

Trouble is when I run it I get an error about some modules are dispatched on the CPU or the disk. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained
Looking into it on hugging_face and google, and some of the code in modeling_utils.py. I didn’t see anything that suggested you could do something like ‘load_in_4bit_fp32_cpu_offload=True’ or something along those lines. I’m guessing it isn’t a feature yet, or maybe coming? If anyone has any ideas about that I’d be really grateful. I’m going to test the 8bit thing and see if that’s enough but I’m not totally certain that 8bits will be low enough to load the model unfortunately.

same issue here while using alpaca_eval: error while using guanaco_33b as evaluator · Issue #134 · tatsu-lab/alpaca_eval · GitHub

it is happening with device_map=auto, load_in_4/8bit=true
I don’t get why it says some modules are dispatched on cpu or disk, it is not the case.