Hi all, Please, I would like to pass my custom datasets to trainer for a text classification. I have read an example on how to pass one of the pre-packaged datasets to trainer, but what I don’t understand is: what should be the names of the columns holding the input_ids and the labels after tokenization? Also, before tokenization, what shoud be names of the columns holding the text and labels ? Thanks a lot
hey @rahmanoladi, in general you can have whatever column names you want for the text and labels before tokenization - it’s up to you to decide how the text should be processed.
once you’ve tokenized the text, you shouldn’t need to rename the resulting columns like
attention_mask (and i wouldn’t recommend this since it will probably break the
by default, the
Trainer looks for the label column name
labels but you can override this by specifying the value of
TrainingArguments.label_names: Trainer — transformers 4.5.0.dev0 documentation