Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines

Kind of having the same issue here. Sometimes I am expecting a long answer so I set the max_new_tokens to a high number. But if I do that and I am expecting a short answer, the model responds and then adds part of my input prompt until it reaches the max_new_tokens value. I have seen examples in Llama-1 where the model can give both short and long answers without including any nonsense words as padding to reach max_new_tokens. Did I do something wrong during fine-tuning?