How do I format the column mapping on the autotrainer?

gokstad · January 24, 2025, 12:07pm

I have a dataset jsonl formatted exactly like this
{"input": "Is the sky blue?", "response": "Yes the sky is blue."}

What do I enter in the Huggingface autotrainer column mapping field? I’m not able to find any tutorials that actually explain this, nor am I able to find an example I can emulate. So I have been stuck trying combinations and my latest failed attempts were:
“input”: “input”, “response”: “output”
input: input, response: output
input, input, response, output
input:text, response:text
input:text, output:text

Am I supposed to put a specific line at the top of my dataset too, labeling the columns?
I’ve tried giving my dataset to ChatGPT and it has never been able to give me a consistent answer to this, the answer is always different each time I do it.

Alanturner2 · January 24, 2025, 1:57pm

Hi there, I read your question and I appreciate your effort in trying to figure this out. Let me help you with setting up the column mapping for your JSONL dataset in Hugging Face AutoTrainer.

Your dataset is already in the correct format with two fields: “input” for the question or prompt and “response” for the answer. These are the fields you will map to the model’s input and output.

When specifying the column mapping in AutoTrainer, you should write it like this:

{
    "input": "input",
    "output": "response"
}

This tells the system that the “input” field in your dataset corresponds to the model’s input, and the “response” field corresponds to the model’s output.

Your JSONL file does not require a header or any special label at the top. Each line of the file should simply be a valid JSON object. For example, your file should look like this:

{"input": "Is the sky blue?", "response": "Yes, the sky is blue."}
{"input": "What is 2 + 2?", "response": "2 + 2 equals 4."}
{"input": "Tell me a joke.", "response": "Why did the scarecrow win an award? Because he was outstanding in his field!"}

Make sure there are no formatting errors in your JSONL file, such as missing curly braces or commas. Each line must be a complete and valid JSON object.

When you upload your dataset to Hugging Face AutoTrainer, select JSONL as the file type and provide the column mapping exactly as shown above. You don’t need to add anything extra to your file for it to work.

I hope this helps clarify things for you! If you still face any issues or have more questions, feel free to ask. Wishing you success with your project!

gokstad · January 24, 2025, 10:37pm

Thanks for the response but That doesn’t seem to be right either, the train failed with this in the report:
KeyError: '{ "input": "input", "output": "response" }'

I don’t see where I can select ‘jsonl’ in the autotrainer. I see the ‘json’ switch, but that’s it.
I guess I should change “response” to “output” in the dataset?

EDIT: For schitz und giggles I tried it again and flipped the JSON switch. I thought this switch was just a format preference. I think it’s actually training the model. It’s always paused by not but it’s still going and reporting ‘ET /ui/is_model_training HTTP/1.1" 200 OK’.
It was that switch the entire time?

Topic		Replies	Views
How to fine-tune an LLM with AutoTrain? 🤗AutoTrain	5	2846	March 3, 2024
Cannot upload CSV or JSONLines To Autotrain 🤗AutoTrain	2	894	May 10, 2023
Autotrain-advanced LLM finetuning: issues with ORPO/DPO dataset format 🤗AutoTrain	6	609	May 27, 2024
Column Mapping in Autotrain 🤗AutoTrain	1	34	April 11, 2025
SFT for chatbot - `text` column 🤗AutoTrain	2	813	March 8, 2024

How do I format the column mapping on the autotrainer?

Related topics