I have a dataset (BIO tagging) with the following features:
{
'words': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
'word_labels': Sequence(feature=ClassLabel(num_classes=3, names=['B', 'I', 'O'], id=None), length=-1, id=None)
}
I preprocessed it with the tokenize_and_align_labels
function that is provided here.
The new feature that is added to previous ones looks like this:
'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None)
Is there a way to get a Sequence(feature=ClassLabel(...), ...)
instead? I’ve already tried to use Dataset.class_encode_column
, but it doesn’t work with a Sequence
. I’ve also tried to use Dataset.cast_column
but it doesn’t work.