Understanding how changing bnb_4bit_compute_dtype affects outputs

samlhuillier · August 25, 2023, 1:36pm

I’m struggling to understand the impact of bnb_4bit_compute_dtype. Specifically, I’m thinking that if i were to use the quantization config below and switched the bnb_4bit_compute_dtype to float32 - nothing should change in terms of outputs/quality of the model? My thinking is that a 4 bit value should fit into both 16 and 32 bits.

    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

model_4bit = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

Does anyone have an explanation for this?

shahzebnaveed · February 10, 2024, 8:24am

From QLoRA paper:

“QLORA has one low-precision storage data type, in our case usually 4-bit, and one computation data type that is usually BFloat16. In practice, this means whenever a QLORA weight tensor is used, we dequantize the tensor to BFloat16, and then perform a matrix multiplication in 16-bit.for CausalLM models, the last lm_head is kept in its original dtype.”

This simply means that the tensors are stored in 4-bit quantized format as proposed by QLoRa and whenever a computation needs to be performed, these are pulled in the “computation” data type which is usually FP16 or BF16.

Note that in QLoRa (a parameter-efficient fine-tuning technique), the purpose is to freeze the original layers (which are now quantized in 4-bit and only converted to bnb_4bit_compute_dtype when any calculations are to be performed) and only train the new adapter weights in FP32 or in mixed-precision if fp16=True is specified while fine-tuning.

Topic		Replies	Views
Changing bnb_4bit_compute_dtype Beginners	0	135	July 18, 2024
Is this needed: bnb 4bit use double quant = True? Beginners	3	2573	March 7, 2025
Does loading in 4bit override an 8bit model? 🤗Transformers	0	692	October 20, 2023
Parameter Count & Shape Discrepancies in 4-bit vs. Higher bit LLM models 🤗Transformers	2	657	June 3, 2024
BitsAndBytes transformers issue 🤗Transformers	1	2436	September 15, 2023

Understanding how changing bnb_4bit_compute_dtype affects outputs

Related topics