ValueError: Field 'ner_tags' from the JSON data of type list<item: string> is not compatible with ClassLabel. Compatible types are int64 and string

giggio · February 7, 2022, 6:04am

I have no idea why I’m getting this error when I’m trying to load_dataset

classFeatures = Features({
‘ner_tags’: ClassLabel(num_classes=3, names=[‘O’, ‘B-FAR’, ‘I-FAR’])
})
dataset = load_dataset(“json”,
data_files=“data.jsonl”,
use_auth_token=True,
features=classFeatures)

Thanks

lhoestq · February 7, 2022, 9:03pm

Hi ! I think this is because the classFeatures is missing the fact that ner_tags is actually a sequence of class labels:

classFeatures = Features({
    ‘ner_tags’: Sequence(ClassLabel(names=[‘O’, ‘B-FAR’, ‘I-FAR’]))
})

giggio · February 14, 2022, 5:27pm

Okay, I forgot about that

Now the error is different: ArrowInvalid: Failed to parse string: ‘O’ as a scalar of type int64

lhoestq · February 15, 2022, 3:35pm

If I recall correctly this issue has been fixed in a recent version of the library, could you try updating datasets ?

giggio · February 17, 2022, 10:14pm

Yes

import datasets
print(datasets.version)

1.18.3

mariosasko · February 18, 2022, 11:34am

Hi! If I’m not mistaken, this doesn’t work for class labels nested inside a dict or a list. I think we will push the fix before the next release. In the meantime, load the dataset without specifying features and do the map where you convert tags to integers and set features to classFeatures.

@lhoestq WDYT about adding the cast_storage method to ClassLabel as well, to support str → int conversion?

lhoestq · February 23, 2022, 2:20pm

It’s not just casting (in the sense of manipulating arrays/buffers and dtypes), but a processing operation. Because of that and to have good performance and reasonable memory usage, using map (or something similar) is probably best (especially for big datasets).

evs · March 25, 2022, 4:17pm

Hi, I’m trying to create a dataset whose ner_tags feature is of type ClassLabel, but casting is not possible when tags are nested inside a list as you said. Any idea on how to achieve this? Thanks xx

Topic		Replies	Views
RoBERTa - Creating a feature of type ClassLabel Beginners	0	746	March 26, 2022
How to apply training ClassLabels on test / validation Dataset 🤗Datasets	2	369	September 20, 2023
Class Labels for Custom Datasets 🤗Datasets	4	17904	June 2, 2022
Changing ClassLabels for NER Beginners	3	528	November 13, 2023
TypeError in load_dataset related to a sequence of strings 🤗Datasets	3	1940	October 3, 2022

ValueError: Field 'ner_tags' from the JSON data of type list<item: string> is not compatible with ClassLabel. Compatible types are int64 and string

1.18.3

Related topics