Llama 2 repeats its prompt as output without answering the prompt

Irisjacobs · March 20, 2024, 9:15am

I followed the Huggingface instructions (Llama 2 is here - get it on Hugging Face) but I cannot get a proper output from Llama 2. It keeps repeating the prompt without giving an output. Inputting a simple prompt without system prompt does work but for the purpose I’ll be using the model, I need to give a system prompt.

This is my code:

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", use_fast = True) 
base_model = "meta-llama/Llama-2-7b-hf"

model = AutoModelForCausalLM.from_pretrained(base_model, low_cpu_mem_usage=True, return_dict=True, torch_dtype=torch.float16, device_map = "auto")

prompt = """ <s>[INST] <<SYS>>

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.

<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST] """

import transformers

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer, 
    temperature=0.1
)


sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    max_length=500,
    return_full_text=False,
    eos_token_id=tokenizer.eos_token_id)

for seq in sequences:
    print(f"{seq['generated_text']}")

What am I doing wrong?

Irisjacobs · March 20, 2024, 9:16am

For reference, this is what I get as an output when I run this code:

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST] 

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST] 

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer

Yurkoff · March 20, 2024, 6:21pm

Try to set repetition_penalry=1.2 or ‘top_p=0.95’, or ‘temperature=0.8’.
Experiment with these parameters. Try a combination of them.

chintusingh123 · September 30, 2024, 4:09pm

Use
return_full_text=False

Topic		Replies	Views
Trying to understand system prompts with Llama 2 and transformers interface 🤗Transformers	9	45708	October 19, 2024
Llama 2 don't reponse prompt invokes Models	0	404	February 9, 2024
Meta Llama-3 prompt sample Models	1	1871	July 21, 2024
AI model (llama) is producing garbage output Beginners	2	250	January 9, 2025
Making llama text generation, deterministic Models	1	9794	August 16, 2023

Llama 2 repeats its prompt as output without answering the prompt

Related topics