I’m struggling to understand the impact of bnb_4bit_compute_dtype. Specifically, I’m thinking that if i were to use the quantization config below and switched the bnb_4bit_compute_dtype to float32 - nothing should change in terms of outputs/quality of the model? My thinking is that a 4 bit value should fit into both 16 and 32 bits.
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16
)
model_4bit = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
Does anyone have an explanation for this?