Column names of custom dataset for use with trainer

rahmanoladi · April 15, 2021, 5:24pm

Hi all, Please, I would like to pass my custom datasets to trainer for a text classification. I have read an example on how to pass one of the pre-packaged datasets to trainer, but what I don’t understand is: what should be the names of the columns holding the input_ids and the labels after tokenization? Also, before tokenization, what shoud be names of the columns holding the text and labels ? Thanks a lot

lewtun · April 16, 2021, 8:10am

hey @rahmanoladi, in general you can have whatever column names you want for the text and labels before tokenization - it’s up to you to decide how the text should be processed.

once you’ve tokenized the text, you shouldn’t need to rename the resulting columns like input_ids and attention_mask (and i wouldn’t recommend this since it will probably break the Trainer logic).

by default, the Trainer looks for the label column name labels but you can override this by specifying the value of TrainingArguments.label_names: Trainer — transformers 4.5.0.dev0 documentation

ali-hooshmand · February 1, 2024, 6:15pm

Hey @lewtun , so why in these two tutorials (1, 2) , there is no column named “labels” (there is a column named “label”) and there is also no label_name setting in training arguments? How does the Trainer know the label column is “label”? thanks

wenmingface · March 31, 2024, 4:34am

There is a pattern for label column in Trainer, as long as your label column name prefix by label(eg. label_name, labels, label_ids.etc), the Trainer will know which one is the label column

Topic		Replies	Views
How can I fine tune with my own dataset? 🤗Transformers	0	375	May 3, 2022
How Does Trainer Know Which Trianing Input and Labels to Use? Beginners	4	484	June 10, 2025
Label 2 id not working Beginners	1	183	June 12, 2025
No labels column for tokenized data 🤗Tokenizers	2	2235	June 27, 2022
How to specify labels-column in BERT Beginners	4	4406	January 20, 2022

Column names of custom dataset for use with trainer

Related topics