Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines

esteamgcp · July 29, 2023, 10:53am

Kind of having the same issue here. Sometimes I am expecting a long answer so I set the max_new_tokens to a high number. But if I do that and I am expecting a short answer, the model responds and then adds part of my input prompt until it reaches the max_new_tokens value. I have seen examples in Llama-1 where the model can give both short and long answers without including any nonsense words as padding to reach max_new_tokens. Did I do something wrong during fine-tuning?

Topic		Replies	Views
Llama-2-70b-chat-hf Model is adding irrelevant topics to output Models	0	689	October 20, 2023
Llama 2 repeats its prompt as output without answering the prompt 🤗Transformers	3	3888	September 30, 2024
How to just get the answer from Llama-2 instead of repeating the whole prompt? Models	2	2733	April 15, 2024
Text Generation response truncation Beginners	6	1413	August 18, 2024
How to set Llama-2-Chat prompt context Models	2	15582	October 18, 2023

Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines

Related topics