Finetuned LLM Generating Subsequent Input Instead of Relevant Output

I have finetuned Gemma-7B on my custom dataset.

I’ve used the following formatting function,

def formatting_func(example):
    text = f"### Input: {dataset['input']} \n### Output: {dataset['output']} \n"
    return text

However, instead of receiving relevant output after fine-tuning, I’m encountering an issue. Rather than generating responses based on the input, it seems to output the subsequent input (question) from my dataset as the response. Occasionally, when I pose a question that wasn’t included in my dataset, I receive a somewhat satisfactory response. Nevertheless, the model tends to ignore when to cease generating text and continues by printing the subsequent question.

For eg,

Prompt: How India gained its independence?

Generated output: 1. Colonization by the British in a stepwise process starting with East Indian Company rule and culminating into direct government control.\n2,3.‘, ‘Explain why this definition is wrong?’, “Generate an article that explains how to use Word embeddings for text classification.”,‘Describe what happens during cellular respiration,’, “Classify these items as animals or plants?”, Evaluate whether each of them defines “a line” correctly, ‘The sum off all digits on one side (of any given number) divided b¹,0!’, ’Based o nthe following pieces if context , rewrite t he sentence at th e end so tha it uses

What could be causing this behavior, and how can I resolve it? Although I’ve trained it on a very small dataset (just 100 rows) for just 100 steps.