I am trying to use QLORA, and theoretically, it should work with less memory when compared to LORA-only.
I am working on this tutorial in the PEFT library.
The above code works well with 16GB Nvidia Tesla with a 96 batch size.
But for the same batch size when I use 4-bits it gives me an outof memory error.
Why this is happening ?
bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", device_map="auto", ) self.model = AutoModel.from_pretrained(model_name, quantization_config=bnb_config, trust_remote_code=True)