Autotrain-advanced LLM finetuning: issues with ORPO/DPO dataset format

Hey all. I’m trying to execute autotrain-advanced locally. I’ve formatted my JSONL files with my training examples based on the instructions here:

When I follow the setup for the SFT trainer, the suggested format works fine. When I try to use the ORPO or DPO trainers, however (with “rejected_text” and/or “prompt” keys), I run into trouble. I get variations on the following error:

Original column name autotrain_rejected_text not in the dataset. Current columns in the dataset: ['rejected_text', 'chosen']

There’s a number of things that I notice are odd here. The dataset I am supplying has text, not chosen - is that getting renamed on the by autotrain? Looking at the params passed to the launch command, I see text_column': 'autotrain_text', 'rejected_text_column': 'autotrain_rejected_text'. I patterned my config file after the ones in the github repo, so it has specific column_mapping values for these that seem to be ignored, no matter what I put in (obviously defaulting to text and rejected_text). I tried fooling it by specifically putting in autotrain_text and autotrain_rejected_text in the input file… no dice, it tells me those names are reserved.

I’ve tried the data both as CSV and JSONL; neither work. What am I missing? Is this functionality just bugged right now?

(Bonus question: can I explicitly feed it a validation dataset somehow? It seems to ignore everything except train.jsonl that I put in the data folder.)

If it helps: current running autotrain-advanced 0.7.107 via WSL2. I also tested this on my macbook air and got the same error, so I don’t think it is directly environment-related. I was able to train a working lora using the SFT trainer (with text as the only key in my jsonl dataset), so I know my install is not completely janked here. Google returns zero hits for my specific error message, which is honestly impressive.

please show a few lines from your jsonl and the command used for training.

Here’s a toy example of the JSONL format that yields the same behavior:

{"text":"<|user|>I don't know why you say goodbye...<|end|><|assistant|>I say hello!<|end|>","rejected_text":"<|user|>I don't know why you say goodbye...<|end|><|assistant|>I, too, say hello!<|end|>"}
{"text":"<|user|>This is the beginning of...<|end|><|assistant|>a beautiful friendship!<|end|>","rejected_text":"<|user|>This is the beginning of...<|end|><|assistant|>an ugly enmity!<|end|>"}
{"text":"<|user|>Gimme five bees for<|end|><|assistant|>a quarter, they used to say.<|end|>","rejected_text":"<|user|>Gimme five bees for<|end|><|assistant|>the beehive in my yard.<|end|>"}

And here’s an example of an attempt to configure it via command line params that produces the error:

autotrain llm \
--train \
--model microsoft/Phi-3-mini-4k-instruct \
--data-path ~/projects/phi-3-ft/data-phi-3-ft/ \
--lr 1e-4 \
--batch-size 1 \
--epochs 12 \
--trainer orpo \
--peft \
--project-name phi3-ft \
--merge_adapter

(Have redacted my HF token, obviously)

I notice that the example yaml config files for ORPO training also specify a prompt column (see: autotrain-advanced/configs/llm_finetuning/llama3-8b-dpo-qlora.yml at main · huggingface/autotrain-advanced · GitHub ); is that in error? The dataset formatting page suggests that ORPO only wants text and rejected_text.

Any thoughts on things I could try here? I’m tempted to dive into the source code and see where these mystery column names are coming from.

The tool breaks at an earlier stage if I use any key other than text for the “chosen” text field. It seems to ignore what key I use in place rejected_text, which… maybe suggests that it’s not actually reading that field in the same way that it reads text?

please see the docs. orpo needs: prompt, chosen and rejected. What is AutoTrain Advanced?

you also seem to be ignoring column mapping

Thank you for your patience. I’m still struggling with this; is there a way to specify the column mappings via a command line parameter? I can see from the example config files how to specify them in YAML, but now when I try to run from a config file autotrain insists that it can’t find any data or scripts in the data directory (which contains just the train.jsonl file). The web/app interface appears to expect a JSON dict for the mappings, i.e. {"text":"text","rejected_text":"rejected_text"}; I will explore and see if I can get it to work in that environment.

If ORPO requires a “prompt” field, the data format documentation needs updating, as it specifically indicates that reward/ORPO trainer wants “text” and “rejected_text” without “prompt”; this is what I was basing my formatting on originally: