The model isn’t ending responses with <|eot_id|>
because it wasn’t trained to do so. Just editing the config files afterward doesn’t teach the model when to stop.
Simple Solution:
-
Make sure every example in your training data ends with
<|eot_id|>
so the model learns to generate it -
Don’t mask out
<|eot_id|>
during training - let the model learn to predict it -
Train with the full chat template format from the start
Instead of starting from base model, try fine-tuning an existing Llama 3.1 instruct model with your specific data. This way the chat format is already learned.