Why does tokenizer.apply_chat_template() add multiple eos tokens?

Chahnwoo · September 12, 2024, 6:32am

The LLaMA-3.1-8B-Instruct model uses the chat template defined below:

  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"

To the best of my knowledge, the segment below adds an eos_token to the end of every conversation turn:

{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}

This is my first time working with multi-turn conversation data, and I am wondering why an eos_token is added to the end of every turn. Wouldn’t training on data like this give the model a mistaken understanding that text can be generated even after the eos_token?

Or does this not matter during inference because the LLM programmatically cuts generation once the eos_token has been generated?

swtb · September 12, 2024, 2:47pm

The models have been trained with the EOS token positioned in this way to create the illusion of a multi-turn conversation. Therefore, sticking to the prompt template used during the instruction finetune will cause the model to also stick to its expected behaviour.

The EOS token then effectively helps the model to understand that whomever was speaking is now done.

In some examples I have found that omitting the EOS token in my query caused the model to attempt to complete my query. Whereas adding the EOS token caused the model to reply to my query.

As with all things, experiment an see what happens.

Chahnwoo · September 13, 2024, 2:20am

@swtb
It’s interesting that you noticed this difference:

I’ve been processing what you’ve said, and your explanation that the eos_token of the Llama-3.1-8B-Instruct model serves a different purpose from that of the Llama-3.1-8B base model does make intuitive sense to me.

As you’ve suggested, it would seem that structuring multi-turn conversations in this way would induce the model to learn that the eos_token marks the end of a turn, rather than the end of a model generation.

I’ve begun thinking that this may be why the base model and instruct models have different eos_token by default. The base model has the eos_token set to:

While the instruct model has the eos_token set to:

Perhaps this was an intentional choice by Meta, so as to avoid “overwriting” the information the Llama model had learned about the <|end_of_text|> token? By setting a different eos_token and ensuring that the chat_template made use of <|eot_id|>, perhaps they were able to preserve what was previously learned about the <|end_of_text|> token while inducing the behavior they desired.

If this interpretation seems off in any way, please let me know!

swtb · September 19, 2024, 12:00pm

@Chahnwoo I think you are on the right lines. If I wanted to train certain behaviour into a model using special tokens and markers I would definitely use a new token rather than reusing and old one.

We for sure want to leverage the base models language understanding for the instruction tuning. Almost as an additive task that relies on the deeper knowledge in the pretraining.

system · September 20, 2024, 12:07am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When I using the chat_template of llama 2 tokenizer the response of IT model is nothing 🤗Tokenizers	0	112	July 13, 2024
Issue with LLaMA-3 Fine-Tuning: Model Generates Correct Answer but Then Adds Unrelated Questions 🤗AutoTrain	5	322	April 8, 2025
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat 🤗Transformers	2	693	May 25, 2025
Llama inference with apply_chat_template Beginners	0	216	November 30, 2024
Question about llama fine tuning dataset token string Beginners	1	14	May 17, 2025

Why does tokenizer.apply_chat_template() add multiple eos tokens?

Related topics