I was running a model configed with LoRA.
#load model from huggingface from peft import LoraConfig, get_peft_model lora_config = LoraConfig( r=16, lora_alpha=16, # target_modules=["query_key_value"], lora_dropout=0.1, bias="none", ) lora_model = get_peft_model(model, lora_config) model = lora_model
I thought if I didn’t set target_modules to some certain layers, this should be as same as original model. But I found this couple of lines would reduce my device memory from 57GB to 11GB per device.
Can someone please tell me why this could happen?
ps: running with huggingface falcon from_config, deepspeed stage3, text_classification.