Question about llama fine tuning dataset token string

Not limited to <>, special characters that have no meaning in themselves should not be passed unless there is a specific intention to do so. Generally, it is better to have as little noise as possible in data. For chat or writing models, it is better not to pass non-conversational character strings. (This does not apply when <> has a specific meaning, such as the emoticon >_<.)

However, when training a model, Chat Templates or special tokens are important, but their assignment should be automated to some extent by the Tokenizer. (This is the default behavior unless explicitly specified.)