I am currently following the fine-tuning methods for the Hugging Face model Zephyr 7B. They have implemented two fine-tuning methods, namely SFT and DPO, on a public dataset. Currently, I am fine-tuning a 7B model using SFT, which is progressing well. However, I have a question regarding whether it is acceptable to fine-tune the model on a DPO dataset generated synthetically by GPT-3.5.
From my understanding, DPO should be trained on the answers produced by the same model. I want to confirm this and inquire if anyone has attempted such fine-tuning before.