or that affects only speed
Basically, this should be the case, and there are few cases where you get half-baked results due to insufficient hardware performance. It’s either it works or it doesn’t, and it’s either fast or slow.
I found the official HF implementation for Llama2. It may be that tokenizer.use_default_system_prompt = False is meaningful.
Since Llama2 has been around for a long time, it has been affected by various HF specification changes, so there is likely to be some confusion about how to use it.