Add Sequence(feature=ClassLabel(...), ...) to an existing dataset

I have a dataset (BIO tagging) with the following features:

    'words': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
    'word_labels': Sequence(feature=ClassLabel(num_classes=3, names=['B', 'I', 'O'], id=None), length=-1, id=None)

I preprocessed it with the tokenize_and_align_labels function that is provided here.

The new feature that is added to previous ones looks like this:

'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None)

Is there a way to get a Sequence(feature=ClassLabel(...), ...) instead? I’ve already tried to use Dataset.class_encode_column, but it doesn’t work with a Sequence. I’ve also tried to use Dataset.cast_column but it doesn’t work.

Hi! Sadly, I can’t reproduce the error. Could you please update your installation of datasets to the newest version with pip install -U datasets and run dset.cast_column("labels", datasets.Sequence(datasets.ClassLabel(names=<labels>))) again? In case it fails, please share the entire traceback with us.