Autotrain training data format (text column)

Hi all,
I am fine tuning the llama2-7b-hf model using autotrain advanced.
My goal is to have an input prompt which is defining a topic and as a output I want my model to generate a single choice question wich 4 answer possibilities.

My current status is:
My CSV training file includes a text column which is formatted as follows:
"Below you will find a topic. Create a Single choice question that queries the given topic. Create 3 wrong answer possibilities and 1 correct one.

###topic:

“The Benefits of B2C E-Commerce Platforms”

###Question:

The online platform used in B2C e-commerce allows for______

a) selling products and services directly to consumers

b) creating a physical storefront

c) in-person payment methods

d) selling products and services to businesses"

I have 200 columns like this. and I trained with 60 epochs.

After training my fine tuned model behaves as follows:

Prompt:
“The Benefits of B2C E-Commerce Platforms”

Output:
"The Benefits of B2C E-Commerce Platforms

Below you will find a topic. Create a Single choice question that queries the given topic. Create 3 wrong answer possibilities and 1 correct one.

###topic:

“Enhancing Customer Experience through B2C E-Commerce Platforms”

###Question:

B2C e-commerce platforms offer businesses a way to______their customers’ shopping experience.

a) increase

b) improve

c) decrease

d) minimize Below you will find a topic. Create a Single choice question that queries the given topic. Create 3 wrong answer possibilities and 1 correct one.

###topic:

“Improving Customer Satisfaction through B2C E-Commerce”

###Question:

B2C e-commerce platforms allow customers to______and compare products before making a purchase.

a) research

b) browse

c) return"

So my problems are:

  • it will not stop generating the output until it reaches my given token limit.
  • also the the goal is to only output one question with its answers and not to repeat the topic and make up new topics and not to repeat my training text.

I assume the problem lies in my training data. Or maybe somewhere else?

I hope someone can help me out with that or provide some tips on how to improve the outcome.