How to merge lora adapter to FSDP wrapped model

I use dpo trainer to train the model using FSDP using accelerate.

After done, i want to merge the adaptor to the base model.

I am using the code below, but it throws error saying cannot find the layer name.

from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
with FSDP.summon_full_params(trainer.model):
    base_model = trainer.model.unload()
    print(base_model)  # I see it still has layer name inside: (_fsdp_wrapped_module)
    peft_model = PeftModel.from_pretrained(base_model, adapter_path)
    merged_model = peft_model.merge_and_unload()
    merged_model.save_pretrained(output_path, safe_serialization=False)
    self.tokenizer.save_pretrained(output_path)

Error is like
[rank0]: AssertionError: FSDP assumes model.layers.0.self_attn.q_proj.base_layer.weight is in the state_dict but the state_dict only has odict_keys([....

1 Like

It seems possible that assertions are being made where assertions should not be made. Not so much a bug, apparently…

After I run trainer.model.unload(), It still has fsdp wrapped layer name. The saved adapter state dict does not have this name becase insave the full state dict

What I do now is to reload the base model using normal from_prretained method and then use PeftModel.from_pretained method to convert to peft model and then run merge and unload method

Is it possible directly using the fsdp peft model to merge the adapter?

1 Like