Hello
I have a dataset consisting of dialogues between two people which I would like to use for fine-tuning GPT-J (EleutherAI/gpt-j-6B). Please see below for two example dialogues. The dialogues vary in length and can be longer than the examples.
Is the format of the conversations ok? For fine-tuning, should I just concatenate all conversations into one big file or do I have to use a separator between the conversations (if yes, which separator)?
First Dialogue:
user1:
Hey there. What’s up?user2:
Not much, just hanging out. What about you?user1:
Just thinking about what I’m going to do this weekend. You?user2:
Probably just relaxing. What do you have planned?user1:
I’m thinking about going to the beach. It’s supposed to be nice this weekend.user2:
That sounds like a great plan! Have you been to the beach recently?user1:
Not in a while. It would be nice to get out and enjoy the sun.user2:
Definitely! I’m sure it’ll be a great time. Do you have any other ideas for the weekend?
Second Dialgoue:
user1:
Good morning. What is your profession?user2:
Good morning. I’m an accountant. What about you?user1:
I’m a software engineer. How long have you been an accountant?user2:
I’ve been an accountant for about five years now. What about you? How long have you been a software engineer?user1:
I’ve been a software engineer for three years. What do you like most about accounting?user2:
I like how challenging it can be. There’s always something to learn or something new to figure out. What do you like most about software engineering?user1:
I like how creative it can be. I get to come up with new ideas and new ways of solving problems. It’s a great feeling when you can come up with something that works.