I’m trying to run my model with:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
Trouble is when I run it I get an error about some modules are dispatched on the CPU or the disk. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom device_map
to from_pretrained
Looking into it on hugging_face and google, and some of the code in modeling_utils.py. I didn’t see anything that suggested you could do something like ‘load_in_4bit_fp32_cpu_offload=True’ or something along those lines. I’m guessing it isn’t a feature yet, or maybe coming? If anyone has any ideas about that I’d be really grateful. I’m going to test the 8bit thing and see if that’s enough but I’m not totally certain that 8bits will be low enough to load the model unfortunately.