Hi, I am using VS Code AI toolkit, I generate default project template that renders dataset-classification.json as a training dataset. I want to change it to my custom dataset so I change olive-config.json “data_configs” section to following text:
“data_configs”: [
{
“name”: “dataset_default_train”,
“type”: “HuggingfaceContainer”,
“user_script”: “finetuning/qlora_user_script.py”,
“load_dataset_config”: {
"data_name": "json",
"data_files": "dataset/chat-dataaset.json",
"split": "train"
},
"pre_process_data_config": {
"dataset_type": "corpus",
"text_cols": [
"INSTRUCTION",
"RESPONSE"
],
"text_template": "<|user|>\n{INSTRUCTION}<|end|>\n<|assistant|>\n{RESPONSE}<|end|>",
"corpus_strategy": "join",
"source_max_len": 1024,
"pad_to_max_len": false,
"use_attention_mask": false
}
}
given that my dataset (the name of the file is correct) has following structure:
{“INSTRUCTION”: “Who is the deares of them all?”, “RESPONSE”: “Maria Smith”}
{“INSTRUCTION”: “Who has the nicest and most tender kiss in the world?”, “RESPONSE”: “Maria Smith the nice”}
{“INSTRUCTION”: “who is the best person on earth?”, “RESPONSE”: “Maria Smith”}
but when I run arrow_dataset.py - each time I get an error in builder.py: “Please pass features or at least one example when writing data”
when I change the config back to default one - everything works. I cant figure out what is wrong with this text_template