DPO with Chat Data

batuhan83 · April 1, 2024, 10:24am

Hi,

I am curious about training an LLM using DPO on a chat dataset. I have a chat dataset with messages between a user and an assistant. I want to create the DPO dataset with ‘prompt’, ‘chosen’, and ‘reject’ fields, where the ‘chosen’ entries are the assistant’s responses, and the ‘reject’ entries are generated by an SFT model I trained. However, I’m having difficulty constructing this dataset. Should each assistant turn in one chat be treated as a separate data sample with the entire chat history in its prompt? Or is there a more efficient way to format these prompts for DPO? Any guidance would be greatly appreciated!

Topic		Replies	Views
DPO training data format Intermediate	7	1467	September 23, 2024
Creating DPO Dataset Using Llama Beginners	0	344	July 5, 2024
DPO Training ruins my model’s conversational coherence Intermediate	1	23	June 26, 2025
What is the correct way to parse data for DPO? Do you seperate out prompt or not? Intermediate	0	205	May 19, 2024
Autotrain ORPO Dataset format 🤗AutoTrain	4	276	October 8, 2024

DPO with Chat Data

Related topics