Can SFT and DPO be done at the same time?

I want to do SFT and DPO at the same time.

What I mean is I want to SFT model a few steps, and then I use that same model to do the DPO.

Or create a Datacollator of about 15 samples to perform SFT. After performing SFT of all 15 samples, I made another Datacollator of about 5 samples to perform DPO, repeating the above process until I had finished running all the data sets.

Does anyone have any ideas that they can suggest to me?

the answer is NO.

I wonder why you ask this question. SFT doesn’t contain reference model, so you just load the only one full-weight model to finetune. But dpo will load the model and forzen it.

difference structure, difference loss, difference codebase. so you should not do it.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.