Is this needed: bnb 4bit use double quant = True?

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.float16,
)

In the code above, I’ve seen two tutorials about QLoRA. One, didn’t have the ‘bnb_4bit_use_double_quant=True’ line, and the second did.

For a bit more clarity one was using a falcon 7b model / automodelfor casualLM and the one that didnt include the line of code was using a flacon model again but a sharded fp16 version. If that makes a difference?

Thanks, not a big deal, just tryna learn about it better.

It further reduces the average memory footprint by quantizing the quantization constants.

See Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

“options include bnb_4bit_use_double_quant which uses a second quantization after the first one to save an additional 0.4 bits per parameter.”

“A rule of thumb is: use double quant if you have problems with memory, use NF4 for higher precision, and use a 16-bit dtype for faster finetuning.”