Can SFT and DPO be done at the same time?

Hamana0509 · May 26, 2024, 7:50am

I want to do SFT and DPO at the same time.

What I mean is I want to SFT model a few steps, and then I use that same model to do the DPO.

Or create a Datacollator of about 15 samples to perform SFT. After performing SFT of all 15 samples, I made another Datacollator of about 5 samples to perform DPO, repeating the above process until I had finished running all the data sets.

Does anyone have any ideas that they can suggest to me?

wjmcat · June 29, 2024, 1:49am

the answer is NO.

I wonder why you ask this question. SFT doesn’t contain reference model, so you just load the only one full-weight model to finetune. But dpo will load the model and forzen it.

difference structure, difference loss, difference codebase. so you should not do it.

system · June 29, 2024, 1:50pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can I do a DPO training on a synthetic dataset? Intermediate	0	407	December 6, 2023
Creating DPO Dataset Using Llama Beginners	0	344	July 5, 2024
Identical Evaluation Metrics for SFT & DPO–Fine-Tuned LoRA Adapter on SeaLLMs-v3-7B 🤗Transformers	1	23	May 22, 2025
How should I combine Accelerate and DPOTrainer for training? 🤗Accelerate	0	422	April 29, 2024
DPO with Chat Data Intermediate	0	313	April 1, 2024

Can SFT and DPO be done at the same time?

Related topics