I have collected a good amount of text dataset and i’m planing to fine-tune a LLM on specific topic
First format is just text (similar to Shakespeare dataset i guess)
Second format is instructs (similar to alpaca dataset) where there is question and answer
so for the best practice to finet-une a LLM model is to use the first format and then the seond one ?
or its just the second one because its extracted from the first one and cover a big part of it ?
thanks in advance