Models are the same after loading lora parameters using peft library

manujmalik · January 24, 2024, 2:54am

Hi, I created a lora and tried to merge it with base model but somehow the new model and the original model is giving the same logits.

base_model is as follows:

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

and the lora_model is created by following code:


expert_lora_path = '/lora-llm/llama-2-7b-expert-shakespeare'
expert_lora_config = LoraConfig.from_pretrained(expert_lora_path)
expert_peft_model = PeftModel.from_pretrained(base_model, expert_lora_path, device_map='cuda').to('cuda')

and is as follows:

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
              (v_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
              (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
              (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
              (act_fn): SiLU()
            )
            (input_layernorm): LlamaRMSNorm()
            (post_attention_layernorm): LlamaRMSNorm()
          )
        )
        (norm): LlamaRMSNorm()
      )
      (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
    )
  )
)

even though, i can clearly see lora modules being injected into the base_model, the logits still remain the same.

I double checked the above argument by comparing the parameters of two models, by the following code:


flag = True
for p1, p2 in zip(base_model.parameters(), antiexpert_peft_model.parameters()):
    if p1.data.ne(p2.data).sum() > 0:
         flag = False
print (flag)

which gives me True as response. I’m confused as what’s wrong in my implementation or was there any error while training.

Topic		Replies	Views
Help with merging LoRA weights back into base model :-) Beginners	11	65566	February 6, 2025
Training CodeLlama2 using LORA doesnt save any memory Beginners	0	701	November 23, 2023
`get_peft_model` or `model.add_adapter` Beginners	2	1168	February 17, 2025
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model? 🤗Transformers	27	936	February 7, 2025
How to properly load the PEFT LoRA model 🤗Transformers	4	6986	April 13, 2025

Models are the same after loading lora parameters using peft library

Related topics