Parameter Count & Shape Discrepancies in 4-bit vs. Higher bit LLM models

I’m delving into an intriguing issue related to model quantization and seeking insights from the community. Specifically, I’ve observed a curious difference in the parameter count and shape between 4-bit and 8/16/32-bit models.

I loaded the same model in different bit representations. Intriguingly, the 4-bit model shows different parameter counts and shapes compared to its 8/16/32-bit counterparts. Checked on Phi and Llama.

m1 = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True, load_in_4bit=True)
m2 = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True, load_in_8bit=True)

print(list(m1.parameters())[1].shape, list(m2.parameters())[1].shape)
# Output: (torch.Size([3276800, 1]), torch.Size([2560, 2560]))

Why does the 4-bit model exhibit different parameter shapes and counts in comparison to higher-bit models? Is this a typical result of the quantization process or something else?

I found a related discussion that hints at quantization impacting trainable parameters Less Trainable Parameters after quantization

Any insights or theories on why this discrepancy occurs would be incredibly valuable.

I have the same question. Have you get an answer yet? @kdcyberdude

Hi @winnieyangwannan,

To explain briefly, it’s due to the packaging of 4-bit quantized weights. Essentially, one byte holds two parameters, each consisting of 4 bits.

For a more detailed explanation, feel free to check out this discussion: Hugging Face Forum - Impact of Quantization on Trainable Parameters.