Add Sequence(feature=ClassLabel(...), ...) to an existing dataset

riccardobucco · April 26, 2022, 10:25am

I have a dataset (BIO tagging) with the following features:

{
    'words': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
    'word_labels': Sequence(feature=ClassLabel(num_classes=3, names=['B', 'I', 'O'], id=None), length=-1, id=None)
}

I preprocessed it with the tokenize_and_align_labels function that is provided here.

The new feature that is added to previous ones looks like this:

'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None)

Is there a way to get a Sequence(feature=ClassLabel(...), ...) instead? I’ve already tried to use Dataset.class_encode_column, but it doesn’t work with a Sequence. I’ve also tried to use Dataset.cast_column but it doesn’t work.

mariosasko · May 2, 2022, 12:33pm

Hi! Sadly, I can’t reproduce the error. Could you please update your installation of datasets to the newest version with pip install -U datasets and run dset.cast_column("labels", datasets.Sequence(datasets.ClassLabel(names=<labels>))) again? In case it fails, please share the entire traceback with us.

Topic		Replies	Views
Sequence features - Class Label Cast_ 🤗Datasets	9	1315	July 4, 2023
Dataset Object without ClassLabel 🤗Datasets	3	1097	March 8, 2023
Creating a Sequence of ClassLabel for multi-label and multi-class problems 🤗Datasets	5	740	March 26, 2024
Problems with Dataset.from_dict() and Feature types 🤗Datasets	1	2226	September 6, 2021
How to apply training ClassLabels on test / validation Dataset 🤗Datasets	2	372	September 20, 2023

Add Sequence(feature=ClassLabel(...), ...) to an existing dataset

Related topics