I have been playing with quantization and PEFT and I noticed that the trainable parameters are significantly reduced after applying quantization (but before applying PEFT) does anyone know why this happens? is it normal?
Here is an example:
Counting Functions:
def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
)
def count_trainable_params(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
Normal Model:
from transformers import AutoModelForCausalLM
model_name = "ehartford/WizardLM-Uncensored-Falcon-7b"
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True
)
print_trainable_parameters(model)
count_trainable_params(model)
# trainable params: 6921725248 || all params: 6921725248 || trainable%: 100.0
# 6921725248
Quantized One:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer
model_name = "ehartford/WizardLM-Uncensored-Falcon-7b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True
)
print_trainable_parameters(model)
count_trainable_params(model)
# trainable params: 295773504 || all params: 3608749376 || trainable%: 8.19601122668819
# 295773504