Problem with Classlabel : Class label -100 greater than configured num_classes 18

Misterpy · June 17, 2022, 1:22pm

Hello to all,
I work on the extraction of information on documents, I use the model LayoutLMV2 that I finetune on my data. To optimize the training I use the Trainer. So I used this example to prepare the data. But I have a problem, after processing the data with the LayoutLMV2 processor, some special tokens will have a value of -100 which is normal. But when I try to transform these data into datasets knowing that I have specified the features I have an error related to the value -100, I think that the Classlabel automatically deducts the indexes of the classes, so if I specify that I have 18 classes and it finds the value -100 it raises an error, if I replace the datasets type. Classlabel(names = labels) by datasets.Value(dtype=‘int64’), it’s going to work but I wonder if it’s the best way to do it and then I find this error weird, because before it worked, besides the example I used doesn’t work anymore (same error) so I wonder if it’s related to the datasets version.
the example I used :Google Colab
my code :

Thank you in advance for your help

nielsr · June 19, 2022, 6:59pm

Hi,

Thanks for reporting. It’s a mistake on my side, the ‘labels’ are actually of type datasets.Sequence(feature=datasets.Value(dtype='int64')), rather than ClassLabel.

Topic		Replies	Views
Fine Tuning IMDb tutorial - Unable to reproduce and adapt Beginners	19	8597	August 21, 2020
Numpy.str_ error during training phase Course	2	1160	December 2, 2021
Dataset Object without ClassLabel 🤗Datasets	3	1097	March 8, 2023
Dataset label format for multi-label text classification 🤗Datasets	9	13317	February 9, 2023
ValueError: Field 'ner_tags' from the JSON data of type list<item: string> is not compatible with ClassLabel. Compatible types are int64 and string 🤗Datasets	7	860	March 25, 2022

Problem with Classlabel : Class label -100 greater than configured num_classes 18

Related topics