Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer)

shamanez · August 3, 2023, 2:35pm

I am trying to use QLORA, and theoretically, it should work with less memory when compared to LORA-only.

I am working on this tutorial in the PEFT library.

The above code works well with 16GB Nvidia Tesla with a 96 batch size.

But for the same batch size when I use 4-bits it gives me an outof memory error.

Why this is happening ?

        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            device_map="auto",
        )

        self.model = AutoModel.from_pretrained(model_name, quantization_config=bnb_config, 
                                                                        trust_remote_code=True)

sd3ntato · August 3, 2023, 4:17pm

what model are you trying to use?
what lora r are you using?
do you have other models loaded on your ram?

shamanez · August 4, 2023, 3:08am

As given in the tutorial, I am using E-5-Large. The LORA config is the same as in the tutorial.

I doubled checked the GPU consumption and when using the quantized model it takes more memory compared to using the normal model with LORA.

Tried both 8-bits and 4-bits, still the same. Any reason for this?

cryptojointer · August 13, 2023, 5:55pm

Did it train in a faster time though?

And did you find the answer to your problem anywhere else?
thank you.

anindya64 · October 8, 2023, 7:14am

The same question is also answered here: Results are inconsistent and is not reliable enough · Issue #1 · RahulSChand/gpu_poor · GitHub

Topic		Replies	Views
QLoRA memory requirement with 3B model loads GPU with 10GB of memory with 4bit quantization Intermediate	0	1155	December 19, 2023
qloRA with cpu offload 🤗Transformers	1	942	February 22, 2024
Training CodeLlama2 using LORA doesnt save any memory Beginners	0	701	November 23, 2023
Help with merging LoRA to base model Beginners	1	39	April 23, 2025
LoRA vs QLoRA finetuning performance on llama2 🤗Transformers	0	2838	September 4, 2023

Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer)

Related topics