I was running a model configed with LoRA.
Like this:
#load model from huggingface
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=16,
lora_alpha=16,
# target_modules=["query_key_value"],
lora_dropout=0.1,
bias="none",
)
lora_model = get_peft_model(model, lora_config)
model = lora_model
I thought if I didn’t set target_modules to some certain layers, this should be as same as original model. But I found this couple of lines would reduce my device memory from 57GB to 11GB per device.
Can someone please tell me why this could happen?
ps: running with huggingface falcon from_config, deepspeed stage3, text_classification.