I wanted to fine-tune bloom-7b1
using qlora with 1x3090(24 GB) and Nvidia Titan(12 GB) .
Is there way to offload weights to CPU ?
The peft github has shown offloading results . How to perform oflloading in qlora?
I wanted to fine-tune bloom-7b1
using qlora with 1x3090(24 GB) and Nvidia Titan(12 GB) .
Is there way to offload weights to CPU ?
The peft github has shown offloading results . How to perform oflloading in qlora?
mess around with the device_map function
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,#load_in_8bit=True,
device_map={“”: Accelerator().local_process_index},
)
check out this
notice how they offload layers to diffrent gpus/cpus (in there case 0 is gpu_0 and 1 will be your other gpu)