Creating DPO Dataset Using Llama

sameearif · July 5, 2024, 9:34pm

Hi everyone,

I am currently working on creating a DPO dataset using Llama, and I have a question regarding the best practice for creating the dataset.

Here’s the approach 1:
Let’s say I sample 5 responses from Llama using a prompt, and after evaluation, sample 5 is deemed the best according to LLM-as-a-judge. The dataset structure would look like this:
Accept Reject
Sample 5 Sample 1
Sample 5 Sample 2
Sample 5 Sample 3
Sample 5 Sample 4

And repeat for other prompts

Here is approach 2:
Only 2 responses are sampled from Llama using a prompt. In this case, the structure would be:
Accept Reject
Sample 2 Sample 1

And repeat for other prompts

My question is, which of these methods is more effective for creating a high-quality DPO dataset? Should I stick with sampling multiple responses and comparing them all to the best one, or is it better to sample just two responses for each prompt?

Any insights or recommendations based on your experiences would be greatly appreciated!

Thanks!

Topic		Replies	Views
DPO with Chat Data Intermediate	0	327	April 1, 2024
DPO training data format Intermediate	7	1845	September 23, 2024
ORPO/DPO dataset clarification 🤗Datasets	3	394	August 29, 2024
Best practice for finetune LLM Intermediate	0	672	June 21, 2023
DALL-E - mini version Flax/JAX Projects	52	8608	August 22, 2021

Creating DPO Dataset Using Llama

Related topics