Problem with Classlabel : Class label -100 greater than configured num_classes 18

Hello to all,
I work on the extraction of information on documents, I use the model LayoutLMV2 that I finetune on my data. To optimize the training I use the Trainer. So I used this example to prepare the data. But I have a problem, after processing the data with the LayoutLMV2 processor, some special tokens will have a value of -100 which is normal. But when I try to transform these data into datasets knowing that I have specified the features I have an error related to the value -100, I think that the Classlabel automatically deducts the indexes of the classes, so if I specify that I have 18 classes and it finds the value -100 it raises an error, if I replace the datasets type. Classlabel(names = labels) by datasets.Value(dtype=‘int64’), it’s going to work but I wonder if it’s the best way to do it and then I find this error weird, because before it worked, besides the example I used doesn’t work anymore (same error) so I wonder if it’s related to the datasets version.
the example I used :Google Colab
my code :

Thank you in advance for your help :slight_smile:

Hi,

Thanks for reporting. It’s a mistake on my side, the ‘labels’ are actually of type datasets.Sequence(feature=datasets.Value(dtype='int64')), rather than ClassLabel.

1 Like