Hi all,
So, currently, I’m training a model with PPOTrainer and Lora.
When I do
model.save_pretrained(…)
It saves both an adapter_model.safetensors
and a pytorch_model.bin
What is the difference between the two. There are both in the same file directory, but it seems when I load a model via from_pretrained
it utilizes the adapter_model.
Does the pytorch_model.bin
also have the lora adapters merged?
Additionally, I also want to do continue PPO training from a checkpoint. I load the checkpoint similarly like this and also
policy = AutoModelForCausalLMWithValueHead.from_pretrained(
model_name,
peft_config=lora_config,
quantization_config=nf4_config,
)
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(
model_name, quantization_config=nf4_config
)
and also directly load the model parameters of the v_head
and pretrained_model
like this
model_dict = torch.load(
os.path.join(best_checkpoint_path, "pytorch_model.bin"),
map_location=lambda s, t: s,
)
v_head_state_dict = {}
pretrained_model_state_dict = {}
for k, v in model_dict.items():
if k.startswith("v_head."):
v_head_state_dict[k.replace("v_head.", "")] = v
else:
pretrained_model_state_dict[k] = v
policy.v_head.load_state_dict(v_head_state_dict)
_load_state_dict_into_model(
policy.pretrained_model,
pretrained_model_state_dict,
start_prefix=""
)
As well as the optimizer. However, when I do this, the KL is also restarted at 0 when resuming training from the checkpoint. To load the model weights, I first tried loading the state_dict of the adapter_model. However, it’s missing keys for the v_head
since it’s just a Lora adapter. How can I verify that training from the checkpoint is resuming properly with LoRA?
I am using versions
transformers 4.48.1
trl 0.9.6