How to perform training on CPU +GPU offloading?

Saugatkafley · December 19, 2023, 6:27am

I wanted to fine-tune bloom-7b1 using qlora with 1x3090(24 GB) and Nvidia Titan(12 GB) .
Is there way to offload weights to CPU ?

The peft github has shown offloading results . How to perform oflloading in qlora?

enochlev · December 19, 2023, 7:09pm

mess around with the device_map function

model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,#load_in_8bit=True,
device_map={“”: Accelerator().local_process_index},

check out this

notice how they offload layers to diffrent gpus/cpus (in there case 0 is gpu_0 and 1 will be your other gpu)

Topic		Replies	Views
qloRA with cpu offload 🤗Transformers	1	942	February 22, 2024
QLoRA memory requirement with 3B model loads GPU with 10GB of memory with 4bit quantization Intermediate	0	1155	December 19, 2023
Accelerate! I have a query, no actual problem to be solved! Beginners	2	284	August 8, 2023
Training using FSDP, qLoRa on multinode 🤗Accelerate	0	59	January 29, 2025
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1748	October 8, 2023

How to perform training on CPU +GPU offloading?

Related topics