Identical Evaluation Metrics for SFT & DPO–Fine-Tuned LoRA Adapter on SeaLLMs-v3-7B

The fact that it’s the same with both PPO and DPO means that, although I don’t know the reason, I think the model weights are probably not being overwritten. For example, requires_grad=False may be set. traineble=True may also be necessary.

model = PeftModel.from_pretrained(model, peft_dir, is_trainable=True).to(device)