Column names of custom dataset for use with trainer

Hi all, Please, I would like to pass my custom datasets to trainer for a text classification. I have read an example on how to pass one of the pre-packaged datasets to trainer, but what I don’t understand is: what should be the names of the columns holding the input_ids and the labels after tokenization? Also, before tokenization, what shoud be names of the columns holding the text and labels ? Thanks a lot

hey @rahmanoladi, in general you can have whatever column names you want for the text and labels before tokenization - it’s up to you to decide how the text should be processed.

once you’ve tokenized the text, you shouldn’t need to rename the resulting columns like input_ids and attention_mask (and i wouldn’t recommend this since it will probably break the Trainer logic).

by default, the Trainer looks for the label column name labels but you can override this by specifying the value of TrainingArguments.label_names: Trainer — transformers 4.5.0.dev0 documentation