qloRA with cpu offload


I’m facing an error: “ValueError: You can’t train a model that has been loaded in 8-bit precision with CPU or disk offload.” when I tried to train LoRA parameters (all of them are on the GPU) with all transformer parameters frozen and in 4bits (bnb) on the GPU. The trick is that to save VRAM, I’m offloading the (unquantized) embeddings onto the cpu (with a device_map). This shouldn’t be a problem to train LoRA parameters, but apparently it’s not supported?
Any work around existing?


Hey @cerisara ,
were you able to find solution for this?