Question about llama fine tuning dataset token string

Sorry. No matter how many times I press the reply and reply buttons with the Enter and Spacebar, the editing window where I can write a post does not appear, so I had no choice but to open a new topic.
In my previous post, I uploaded the code and asked how to fine-tune it well.
And I modified it according to the answer.
Now, I am experimenting with how to structure the dataset to get good results.
In my previous post, I showed an example.
I received an answer that special characters should not be included when fine-tuning.
Here, I have a little question.
gpt3 recommended and to me.
It said to use when starting the body and when the body ends.
However, in my previous post, it said not to use <>.
According to the explanation in gpt3, you need to input and so that the model can know the beginning and end of the body of the novel data.
Should I not use and ?
Can I just input the body without using this string?

1 Like

Not limited to <>, special characters that have no meaning in themselves should not be passed unless there is a specific intention to do so. Generally, it is better to have as little noise as possible in data. For chat or writing models, it is better not to pass non-conversational character strings. (This does not apply when <> has a specific meaning, such as the emoticon >_<.)

However, when training a model, Chat Templates or special tokens are important, but their assignment should be automated to some extent by the Tokenizer. (This is the default behavior unless explicitly specified.)