Custom dataset fails. "Please pass features or at least one example when writing data"

Hi, I am using VS Code AI toolkit, I generate default project template that renders dataset-classification.json as a training dataset. I want to change it to my custom dataset so I change olive-config.json “data_configs” section to following text:

“data_configs”: [
{
“name”: “dataset_default_train”,
“type”: “HuggingfaceContainer”,
“user_script”: “finetuning/qlora_user_script.py”,
“load_dataset_config”: {

        "data_name": "json", 
        "data_files": "dataset/chat-dataaset.json",
        "split": "train"
    },
    "pre_process_data_config": {
        "dataset_type": "corpus",
        "text_cols": [
            "INSTRUCTION",
            "RESPONSE"
          ],
        "text_template": "<|user|>\n{INSTRUCTION}<|end|>\n<|assistant|>\n{RESPONSE}<|end|>",
        "corpus_strategy": "join",
        "source_max_len": 1024,
        "pad_to_max_len": false,
        "use_attention_mask": false
    }
}

given that my dataset (the name of the file is correct) has following structure:
{“INSTRUCTION”: “Who is the deares of them all?”, “RESPONSE”: “Maria Smith”}
{“INSTRUCTION”: “Who has the nicest and most tender kiss in the world?”, “RESPONSE”: “Maria Smith the nice”}
{“INSTRUCTION”: “who is the best person on earth?”, “RESPONSE”: “Maria Smith”}

but when I run arrow_dataset.py - each time I get an error in builder.py: “Please pass features or at least one example when writing data”
when I change the config back to default one - everything works. I cant figure out what is wrong with this text_template

1 Like

Perhaps Olive issue?