How to perform training on CPU +GPU offloading?

I wanted to fine-tune bloom-7b1 using qlora with 1x3090(24 GB) and Nvidia Titan(12 GB) .
Is there way to offload weights to CPU ?

The peft github has shown offloading results . How to perform oflloading in qlora?

mess around with the device_map function

model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,#load_in_8bit=True,
device_map={“”: Accelerator().local_process_index},

)

check out this

notice how they offload layers to diffrent gpus/cpus (in there case 0 is gpu_0 and 1 will be your other gpu)