Parameter Count & Shape Discrepancies in 4-bit vs. Higher bit LLM models

kdcyberdude · June 3, 2024, 12:09pm

To explain briefly, it’s due to the packaging of 4-bit quantized weights. Essentially, one byte holds two parameters, each consisting of 4 bits.

For a more detailed explanation, feel free to check out this discussion: Hugging Face Forum - Impact of Quantization on Trainable Parameters.

Topic		Replies	Views
Difference in Number of Parameters for load_in_4bit Beginners	0	550	August 2, 2023
Number of parameters reduced after loading in 4bit Models	7	921	June 28, 2024
Understanding how changing bnb_4bit_compute_dtype affects outputs 🤗Transformers	1	4649	February 10, 2024
Does loading in 4bit override an 8bit model? 🤗Transformers	0	692	October 20, 2023
Llama2 model parameters count is half Models	1	948	November 29, 2023