Autotrain training data format (text column)

Tim793 · November 3, 2023, 12:13pm

Hi all,
I am fine tuning the llama2-7b-hf model using autotrain advanced.
My goal is to have an input prompt which is defining a topic and as a output I want my model to generate a single choice question wich 4 answer possibilities.

My current status is:
My CSV training file includes a text column which is formatted as follows:
"Below you will find a topic. Create a Single choice question that queries the given topic. Create 3 wrong answer possibilities and 1 correct one.

###topic:

“The Benefits of B2C E-Commerce Platforms”

###Question:

The online platform used in B2C e-commerce allows for______

a) selling products and services directly to consumers

b) creating a physical storefront

c) in-person payment methods

d) selling products and services to businesses"

I have 200 columns like this. and I trained with 60 epochs.

After training my fine tuned model behaves as follows:

Prompt:
“The Benefits of B2C E-Commerce Platforms”

Output:
"The Benefits of B2C E-Commerce Platforms

Below you will find a topic. Create a Single choice question that queries the given topic. Create 3 wrong answer possibilities and 1 correct one.

###topic:

“Enhancing Customer Experience through B2C E-Commerce Platforms”

###Question:

B2C e-commerce platforms offer businesses a way to______their customers’ shopping experience.

a) increase

b) improve

c) decrease

d) minimize Below you will find a topic. Create a Single choice question that queries the given topic. Create 3 wrong answer possibilities and 1 correct one.

###topic:

“Improving Customer Satisfaction through B2C E-Commerce”

###Question:

B2C e-commerce platforms allow customers to______and compare products before making a purchase.

a) research

b) browse

c) return"

So my problems are:

it will not stop generating the output until it reaches my given token limit.
also the the goal is to only output one question with its answers and not to repeat the topic and make up new topics and not to repeat my training text.

I assume the problem lies in my training data. Or maybe somewhere else?

I hope someone can help me out with that or provide some tips on how to improve the outcome.

Topic		Replies	Views
AutoTrain csv data format 🤗AutoTrain	9	4402	March 21, 2024
Model Fine Tuning using Llama-2-7b-chat-hf not working for text-to-SQL task Beginners	0	303	June 14, 2024
Column definitions 🤗AutoTrain	0	278	May 25, 2023
Llama-2-7b-chat fine-tuning Models	4	6794	April 26, 2024
Training stops while fine-tuning Llama2-7B with AutoTrain Advancedvanced Beginners	0	420	August 16, 2023

Autotrain training data format (text column)

Related topics