Can I do a DPO training on a synthetic dataset?

Mustafa21 · December 6, 2023, 5:55pm

I am currently following the fine-tuning methods for the Hugging Face model Zephyr 7B. They have implemented two fine-tuning methods, namely SFT and DPO, on a public dataset. Currently, I am fine-tuning a 7B model using SFT, which is progressing well. However, I have a question regarding whether it is acceptable to fine-tune the model on a DPO dataset generated synthetically by GPT-3.5.

From my understanding, DPO should be trained on the answers produced by the same model. I want to confirm this and inquire if anyone has attempted such fine-tuning before.

Topic		Replies	Views
DPO Training ruins my model’s conversational coherence Intermediate	1	23	June 26, 2025
Finetune GPT-J on custom dataset Models	0	2805	January 18, 2022
How to Efficiently Fine-Tune Models on Custom Datasets with Limited Resources? Beginners	0	120	July 10, 2024
GPT-2 fine-tuning Beginners	0	1609	June 12, 2023
Hugging Face and Distributed Training: DDP/DP Implementation Help Needed Intermediate	0	509	February 14, 2024

Can I do a DPO training on a synthetic dataset?

Related topics