qloRA with cpu offload

cerisara · August 21, 2023, 3:52pm

Hi,

I’m facing an error: “ValueError: You can’t train a model that has been loaded in 8-bit precision with CPU or disk offload.” when I tried to train LoRA parameters (all of them are on the GPU) with all transformer parameters frozen and in 4bits (bnb) on the GPU. The trick is that to save VRAM, I’m offloading the (unquantized) embeddings onto the cpu (with a device_map). This shouldn’t be a problem to train LoRA parameters, but apparently it’s not supported?
Any work around existing?

Thanks!

saireddy · February 22, 2024, 3:08am

Hey @cerisara ,
were you able to find solution for this?

Topic		Replies	Views
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1739	October 8, 2023
How to perform training on CPU +GPU offloading? 🤗Transformers	1	1582	December 19, 2023
Help with merging LoRA to base model Beginners	1	35	April 23, 2025
LoRA / QLoRA fine tuning a 8b Model(llama 3.1) Beginners	1	273	February 24, 2025
LoRA vs QLoRA finetuning performance on llama2 🤗Transformers	0	2827	September 4, 2023

qloRA with cpu offload

Related topics