Modules_to_save vs unload() in peft

zpzpcici · May 23, 2025, 1:37pm

Recently, I need to use the peft library for LoRA fine-tuning, and at the same time, I want to fine-tune certain layers of my base model directly such as lm_head.
I would like to save the parameters of my fine-tuned base_model. While using modules_to_save=["lm_head"] in LoraConfig is a convenient approach, it results in high GPU memory usage. Alternatively, explicitly setting requires_grad=True for lm_head and saving it after training using unload()and model.save_pretrained ("xxx")leads to lower memory consumption. What are the differences between these two approaches in terms of saving the model?

John6666 · May 23, 2025, 3:45pm

In theory, there doesn’t seem to be much difference in the results, but the behavior may differ significantly in practice.
I think save_pretrained() is most likely to be well maintained.

github.com/huggingface/peft

modules_to_save not working in add_adapter() with PEFT + Diffusers

opened 03:07AM - 12 Apr 25 UTC

closed 11:51AM - 17 Apr 25 UTC

zzhang2816

**Summary:** When using `peft.LoraConfig` with `modules_to_save=["proj_out"]`, …the parameters inside `proj_out` (which are `requires_grad=True` before adapter addition) are unexpectedly frozen (`requires_grad=False`) after calling `add_adapter()`. This seems to contradict the intended functionality of `modules_to_save`, which should preserve the training status of specified modules. --- **Environment:** - `peft==0.15.1` (latest) - `transformers`, `diffusers` up-to-date - Model: [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V) - Python 3.10 - Torch 2.6.0+cu118 --- **Code to Reproduce:** ```python from diffusers.models import CogVideoXTransformer3DModel import torch from peft import LoraConfig pretrained_model_name_or_path = "THUDM/CogVideoX-5b-I2V" transformer = CogVideoXTransformer3DModel.from_pretrained( pretrained_model_name_or_path, subfolder="transformer", torch_dtype=torch.bfloat16, ) print("before LoRA") print(transformer.proj_out.weight.requires_grad) # True print(transformer.proj_out.bias.requires_grad) # True transformer_lora_config = LoraConfig( r=4, lora_alpha=4, init_lora_weights=True, target_modules=["to_k", "to_q", "to_v", "to_out.0"], modules_to_save=["proj_out"] ) transformer.add_adapter(transformer_lora_config) print("after LoRA") print(transformer.proj_out.weight.requires_grad) # ❌ False (unexpected) print(transformer.proj_out.bias.requires_grad) # ❌ False (unexpected) ``` --- **Expected behavior:** After calling `add_adapter()`, since `"proj_out"` is explicitly listed in `modules_to_save`, both `.weight` and `.bias` of `transformer.proj_out` should **remain trainable**, i.e., `requires_grad=True`. --- **Observed behavior:** Both `.weight` and `.bias` of `proj_out` become `requires_grad=False`, even though `"proj_out"` appears exactly in `named_parameters()` before the adapter is added.

Topic		Replies	Views
What is the proper way to add LoRa adapters and keep some layers trainable? Beginners	4	174	January 31, 2025
Missing trainable parameters in a loaded LoRA model 🤗Transformers	1	1293	July 6, 2023
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference 🤗Transformers	2	1317	August 2, 2024
Load_adapter vs from_pretrained Beginners	1	750	March 20, 2024
Direct Load vs. Base Model + LoRA: How Should You Use It? Models	1	104	March 12, 2025

Modules_to_save vs unload() in peft

Related topics