Issue with LlaMA-2 Chat Template (and out of date documentation)

njbrake · November 9, 2023, 12:04pm

Hi @Rocketknight1 is see that you added the chat_template data for the LlaMA-2 models. There appears to be a bug in that logic where if you only pass in a system prompt, formatting the template returns an empty string/list. For example, the below code results in printing an empty string:

chat = [
  {"role": "system", "content": "You are a helpful and honest assistant."},
]
print(tokenizer.apply_chat_template(chat, tokenize=False))

However, if you edit it to have an empty user object, then it will output the system prompt with the empty user input (which in llaMA comes with an appended “[/INST]”

chat = [
  {"role": "system", "content": "You are a helpful and honest assistant."},
  {"role": "user", "content": ""},
]
print(tokenizer.apply_chat_template(chat, tokenize=False))

This is a problem for scenarious where I only want to retrieve the LlaMA formatted system prompt.

Another miscellaneous comment is that the link for the chat_completion template in meta-llama/Llama-2-13b-chat-hf · Hugging Face points to

chat_completion which I think should now point to line 284, not 212.

Overall, love the addition of chat templates and I look forward to increasing their usage in my codebase!

Rocketknight1 · November 9, 2023, 1:58pm

Hi @njbrake, thanks for the notification! We’ve been able to reproduce the issue here.

Partly, this is caused by us not testing that case, because (I believe) LLaMA-2 was never trained with ‘naked’ system messages like this. However, you’re right that the template should support it properly. I’ll see if I can push a fix soon!

Regarding the incorrect link in the LLaMA documentation, though, that’s an issue you’ll have to take up with the repository maintainers rather than Hugging Face! Try opening an issue on the repo to alert them.

Rocketknight1 · November 9, 2023, 2:21pm

Hi @njbrake, can you try this out, both with just a single system message and with more complex conversations? It should yield the same results as the old template in most cases, but should give proper output when there’s just a single system message now:

        tokenizer.chat_template = (
            "{% if messages[0]['role'] == 'system' %}"
            "{% set loop_messages = messages[1:] %}"  # Extract system message if it's present
            "{% set system_message = messages[0]['content'] %}"
            "{% elif USE_DEFAULT_PROMPT == true and not '<<SYS>>' in messages[0]['content'] %}"
            "{% set loop_messages = messages %}"  # Or use the default system message if the flag is set
            "{% set system_message = 'DEFAULT_SYSTEM_MESSAGE' %}"
            "{% else %}"
            "{% set loop_messages = messages %}"
            "{% set system_message = false %}"
            "{% endif %}"
            "{% if loop_messages|length == 0 and system_message %}"  # Special handling when only sys message present
            "{{ bos_token + '[INST] <<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n [/INST]' }}"
            "{% endif %}"
            "{% for message in loop_messages %}"  # Loop over all non-system messages
            "{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}"
            "{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}"
            "{% endif %}"
            "{% if loop.index0 == 0 and system_message != false %}"  # Embed system message in first message
            "{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}"
            "{% else %}"
            "{% set content = message['content'] %}"
            "{% endif %}"
            "{% if message['role'] == 'user' %}"  # After all of that, handle messages/roles in a fairly normal way
            "{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}"
            "{% elif message['role'] == 'system' %}"
            "{{ '<<SYS>>\\n' + content.strip() + '\\n<</SYS>>\\n\\n' }}"
            "{% elif message['role'] == 'assistant' %}"
            "{{ ' '  + content.strip() + ' ' + eos_token }}"
            "{% endif %}"
            "{% endfor %}"
        )

After you run that block, try apply_chat_template and it should have the new behaviour!

njbrake · November 9, 2023, 2:27pm

Hi @Rocketknight1 thanks for the quick response! Looks like that didn’t quite do it. Now I get a response but it still has the [/INST] at the end of it:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible.
<</SYS>>

 [/INST]

I think it shouldn’t have the newline of [/INST] at the end?

Rocketknight1 · November 9, 2023, 2:35pm

Hi @njbrake - we talked this over this a bit more. In the end, we decided that actually we just shouldn’t be trying to do this! LLaMA enforces a strict rule that chats should alternate user/assistant/user/assistant, and the system message, if present, should be embedded into the first user message. As a result, it’s not clear how the chat template should even handle the case of a solo system message with no user message. We suspect that no matter what you do here, model performance will probably be damaged because the input is different from the formats the model was trained with.

My recommendation would be to incorporate a user message as well as a system message. Alternatively, you can try a different model. Several new models like Zephyr have excellent performance and have simpler handling of system messages, as well as lifting the strict message ordering requirement of LLaMA.

njbrake · November 9, 2023, 3:42pm

Hi @Rocketknight1 Unfortunately, it’s a common use case for RAG frameworks like llama-index, that require a system_prompt be returned without any user input yet.

github.com

run-llama/llama_index/blob/179c6a4992d70e99be0ac11ef3e44a5d3d241b3e/llama_index/llms/huggingface.py#L58


      
          model_name: str = Field(
              description=(
                  "The model name to use from HuggingFace. "
                  "Unused if `model` is passed in directly."
              )
          )
          context_window: int = Field(
              description="The maximum number of tokens available for input."
          )
          max_new_tokens: int = Field(description="The maximum number of tokens to generate.")
          system_prompt: str = Field(
              description=(
                  "The system prompt, containing any extra instructions or context. "
                  "The model card on HuggingFace should specify if this is needed."
              ),
          )
          query_wrapper_prompt: str = Field(
              description=(
                  "The query wrapper prompt, containing the query placeholder. "
                  "The model card on HuggingFace should specify if this is needed. "
                  "Should contain a `{query_str}` placeholder."

I haven’t checked but I would imagine that langchain would have a similar parameter. I think they may copy their own definitions of the llama system prompt format, which I can use, but I was hoping to be able to use the huggingface chat_template to access the system prompt formatting.

Rocketknight1 · November 10, 2023, 2:20pm

If you can find out the system prompt format they use, I can help write a chat template to get that to work for you. I’m still uncertain about updating the official LLaMA template, though - even if llama-index expects it, it (imo) violates the principle that the chat template should preserve the format used in training.

Still, maybe it’s better than just throwing an error. Let me know if you can find the prompt format they use for a solo system message!

njbrake · November 10, 2023, 3:35pm

@Rocketknight1 your explanation makes sense.

Interestingly, I found an example on the LlaMAIndex docs where they are able to avoid the “system_prompt” parameter by passing in a “query_wrapper_prompt” instead. Go figure https://github.com/run-llama/llama_index/blob/9672370a2a0c87dee77195f4a518db7b511fc2ed/docs/examples/vector_stores/SimpleIndexDemoLlama-Local.ipynb#L207

So in that case I think I agree with you, no need to change the chat_template to support my use case, but maybe throwing an error if someone tries to do what I was doing would be a nice feature add. It was confusing for the function to return blank instead of offering some sort of feedback about why it was not working.

Thanks!

Topic		Replies	Views
When I using the chat_template of llama 2 tokenizer the response of IT model is nothing 🤗Tokenizers	0	113	July 13, 2024
Llama 2 repeats its prompt as output without answering the prompt 🤗Transformers	3	3638	September 30, 2024
Llama2 prompt template for finetuning on text summaraization/generation Models	0	318	March 20, 2024
Getting empty response from meta-llama/Meta-Llama-3-8B Models	1	725	June 19, 2024
Trying to understand system prompts with Llama 2 and transformers interface 🤗Transformers	9	45801	October 19, 2024

Issue with LlaMA-2 Chat Template (and out of date documentation)

Related topics