Parameter Count & Shape Discrepancies in 4-bit vs. Higher bit LLM models

Hi @winnieyangwannan,

To explain briefly, it’s due to the packaging of 4-bit quantized weights. Essentially, one byte holds two parameters, each consisting of 4 bits.

For a more detailed explanation, feel free to check out this discussion: Hugging Face Forum - Impact of Quantization on Trainable Parameters.