Data problem for live support for my e-commerce site

MVX20 · October 28, 2024, 8:29pm

Hello friends,

I recorded all human-customer conversations on my e-commerce site in json form. This data is as follows:
{
dialog_1 = [
{“role”:“user”,“content”:“hello”},
{“role”: “user”, “content”: “are you online”},
{“role”: “user”, “content”: “can you help me ?”},
{“role”: “assistant”, “content”: “yes, how can I help you?”},
{“role”: “assistant”, “content”: “What’s your problem?”},
…
],

dialog_2 = [{“role”: “user”, “content”: “hello”},
{“role”: “assistant”, “content”: “hello”},
{“role”: “user”, “content”: “there is a problem with my order”},
{“role”: “assistant”, “content”: “Can I have your order number??”},
…],

…
}

there are 7k dialogs in this structure, but if I give a dataset, it does not do successful learning and does not respond properly. How can I use this dataset successfully? I give it the same as it is.

full_data = list(data_dict)
train_data = []
for conversation_id in full_data:
    conversation = data_dict[conversation_id]
    user_message_buffer = ""
    assistant_response_buffer = ""
    assistant_responses = []
    dialogue_context = ""
    messages_tranin = []


    
    messages_tranin.append({
            "from":"system",
            "value":DEFAULT_SYSTEM_PROMPT,
                                
        })

    for message in conversation:
        role = message['role']
        content = message['content']
        message_type = message["type"]
        if message_type != "chat":
            continue
            content = f"<--{message_type}-->"
        if role == "user":
          role = "human"
        else:
          role = "gpt"
        messages_tranin.append({
            "from":role,
            "value":content,
                                
        })
    if len(messages_tranin) > 10:
      train_data.append({"conversations":messages_tranin})

tokenizer = get_chat_template(
    tokenizer,
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
    chat_template="chatml",
     map_eos_token = True,
)

def apply_template(examples):
    messages = examples["conversations"]
    print(messages)
    text = [tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False) for message in messages]
    return {"text": text}```

Topic		Replies	Views
Training data is not working Beginners	4	169	November 18, 2024
Chatbot for my e-commerce json data Beginners	0	578	November 29, 2023
Error with load model from JSON in datasets 🤗Datasets	2	667	November 25, 2023
Using DialoGPT for Text Classification Beginners	1	1012	December 29, 2021
Questions about ordering training inputs when fine-tuning models Beginners	5	2476	December 4, 2023

Data problem for live support for my e-commerce site

Related topics