Fine-tuning don't work / bad results

Hello,
I need your advice.

I’m trying to fine-tune Llama2 7B (base Model) for a single task. My goal is the following: for a given text it should replace some words by others if they are present in the text, otherwise it should only return the text.

However, when I make the inference, the task is not understood, the model simply repeats the instruction and returns a bit of nonsense.

I used the huggingface trainer and formulated my prompt as ###INSTRUCTION: {} ###INPUT: {} ###RESPONSE: {}.

I used the basic Lora parameters found in all the tutorials. I have about 2000 prompt data.

Does anyone have any advice or explanation for my problem?

Thanks for your help.