Less Trainable Parameters after quantization

ChrisMcCormick · February 29, 2024, 5:00am

As for the number of trainable parameters…

The print_trainable_parameters function iterates over the “named parameters” (the different weight matrices) and, if they are set to train, it adds all of the elements in that weight matrix to the tally. ChatGPT’s comments about individual values being trainable or not is leading us astray–that’s not relevant here (and I don’t know if it’s even true or not ).

So loading in 4-bit breaks that parameter counting code. It’s not as simple as doubling it, either, because note how the Mistral embedding matrix didn’t change size in the quantized version.

Topic		Replies	Views
Number of parameters reduced after loading in 4bit Models	7	934	June 28, 2024
Parameter Count & Shape Discrepancies in 4-bit vs. Higher bit LLM models 🤗Transformers	2	680	June 3, 2024
Does quantization compress the model weights? Research	16	378	September 26, 2024
Difference in Number of Parameters for load_in_4bit Beginners	0	556	August 2, 2023
Loading quantised weights does not work Beginners	0	122	April 12, 2024

Less Trainable Parameters after quantization

Related topics