Label 2 id not working

maazmusa · May 13, 2024, 10:08pm

I recently starting to make a text classification pipeline using this tutorial:

I converted my own data to a dataset with 1 col called text and 1 col called label.

I then did what the tutorial said but I got error:

“”"

Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ to have batched tensors with the same length. Perhaps your features (label in this case) have excessive nesting (inputs type list where type int is expected).

“”"

I thought id2label and label2id parameter in the model would take care of this. But it didn’t so I added a line in my batch tokenizer to convert the labels to int.

now my tokenized dataset has 1 col called text, 1 col called label, 1 col called input_ids, 1 col called attention_masks and 1 more column.

My question is what columns does the Trainer use to train and validate the text classification pipeline? should I remove all other cols from my tokenized dataset?

thetraintomars · June 12, 2025, 9:51am

From reading the troubleshooting docs, the trainer ignores any columns that it doesn’t use, like any untokenized string fields.

Could you share your tokenizer code to convert string labels to int? I am also having an issue where the tutorial code is broken and the trainer seems to ignore id2label and label2id.

Topic		Replies	Views
Column names of custom dataset for use with trainer Beginners	3	5454	March 31, 2024
Unable to train token classification model 🤗Transformers	0	297	April 27, 2023
Understanding multi-label classification training Beginners	0	820	February 14, 2023
No labels column for tokenized data 🤗Tokenizers	2	2235	June 27, 2022
Errors with label2id/id2label with muticlass classification Beginners	2	17	June 25, 2025

Label 2 id not working

Related topics