PPOTrainer + LoRA and Continued Training

rngusry · January 28, 2025, 10:34pm

Hi all,

So, currently, I’m training a model with PPOTrainer and Lora.

When I do

model.save_pretrained(…)

It saves both an adapter_model.safetensors and a pytorch_model.bin

What is the difference between the two. There are both in the same file directory, but it seems when I load a model via from_pretrained it utilizes the adapter_model.

Does the pytorch_model.bin also have the lora adapters merged?

Additionally, I also want to do continue PPO training from a checkpoint. I load the checkpoint similarly like this and also

policy = AutoModelForCausalLMWithValueHead.from_pretrained(
        model_name,
        peft_config=lora_config,
        quantization_config=nf4_config,
    )
    ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(
        model_name, quantization_config=nf4_config
    )

and also directly load the model parameters of the v_head and pretrained_model like this

model_dict = torch.load(
            os.path.join(best_checkpoint_path, "pytorch_model.bin"),
            map_location=lambda s, t: s,
)
v_head_state_dict = {}
pretrained_model_state_dict = {}
for k, v in model_dict.items():
  if k.startswith("v_head."):
      v_head_state_dict[k.replace("v_head.", "")] = v
  else:
      pretrained_model_state_dict[k] = v

policy.v_head.load_state_dict(v_head_state_dict)

_load_state_dict_into_model(
    policy.pretrained_model,
    pretrained_model_state_dict,
    start_prefix=""
)

As well as the optimizer. However, when I do this, the KL is also restarted at 0 when resuming training from the checkpoint. To load the model weights, I first tried loading the state_dict of the adapter_model. However, it’s missing keys for the v_head since it’s just a Lora adapter. How can I verify that training from the checkpoint is resuming properly with LoRA?

I am using versions

transformers             4.48.1
trl                      0.9.6

Topic		Replies	Views
Inference, checkpoint Beginners	0	872	December 5, 2023
Lora: missing adapter keys while loading the checkpoint Intermediate	2	1092	January 6, 2025
Load checkpoint from Trainer 🤗Transformers	0	580	February 13, 2024
Loading Lora models after trainning Beginners	1	3285	June 24, 2024
Load model from checkpoints occurs degraded performance Beginners	2	787	July 7, 2023

PPOTrainer + LoRA and Continued Training

Related topics