BitsAndBytes transformers issue

GaaraOtheSand · September 14, 2023, 6:08pm

I’m trying to run my model with:

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            )

Trouble is when I run it I get an error about some modules are dispatched on the CPU or the disk. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained
Looking into it on hugging_face and google, and some of the code in modeling_utils.py. I didn’t see anything that suggested you could do something like ‘load_in_4bit_fp32_cpu_offload=True’ or something along those lines. I’m guessing it isn’t a feature yet, or maybe coming? If anyone has any ideas about that I’d be really grateful. I’m going to test the 8bit thing and see if that’s enough but I’m not totally certain that 8bits will be low enough to load the model unfortunately.

vince62s · September 15, 2023, 5:53pm

same issue here while using alpaca_eval: error while using guanaco_33b as evaluator · Issue #134 · tatsu-lab/alpaca_eval · GitHub

it is happening with device_map=auto, load_in_4/8bit=true
I don’t get why it says some modules are dispatched on cpu or disk, it is not the case.

Topic		Replies	Views
An error i ve been trying to fix for days now Intermediate	4	426	November 19, 2024
Does loading in 4bit override an 8bit model? 🤗Transformers	0	692	October 20, 2023
Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models DeepSpeed	2	3840	July 27, 2023
Error loading Llama model Beginners	5	1565	March 9, 2024
Fine-tuning with load_in_8bit and inference without load_in_8bit possible? 🤗Transformers	4	24283	August 23, 2022

BitsAndBytes transformers issue

Related topics