Generate dataset with empty features

I am trying to write a custom dataset using GeneratorBasedBuilder and I faced some samples have missing columns sometimes.

I really donā€™t know how to deal with it. Maybe we need creating a ā€œnullableā€ feature value? Is it possible?

something like:

datasets.features.Sequence({"text": datasets.Value("string",nullable=True),
                            "answer_start": datasets.Value("int32"),})

Hi ! We donā€™t support nullable since it creates issues with type inference.
Instead, feel free to use an empty string when a string in missing, or -1 for missing labels for example

Note that values outside of the class labels are now signaled in the dataset viewer, ie. -1

1 Like