What is the text dataset format for fintune LLM?

i want to finetune a LLM on my own dataset that focus on specific subject

i have thousnds of text files about this subject

what is the required dataset format for LLM?

is it just text file?

i have converted some of them to Q&A in json format like this
{
Q:“Question”
A:“Answer”
ID:"QuestionID "
}

Perhaps you can have a look here:

And try to find the one that is the most similar to your case scenario.

1 Like

Thanks, got it