Describe a nullable/optional column in dataset loading script

Assume a dataset has a float column that isn’t populated in all of its rows.

What is the generally accepted way to signify it in datasets.Features and _generate_examples?

Setting the column value to None does not work.

Hi,

you can use Value("float32") or Value("float64") as a column feature and set missing values to None. If this doesn’t work, please open an issue on GitHub and provide the code that reproduces it. The None handling in datasets still has some rough edges, and we are currently fixing it (More robust `None` handling by mariosasko · Pull Request #3195 · huggingface/datasets · GitHub).

2 Likes

Thank you very much, once again. Tracking the issue at #3253. Can draft a bandaid PR, if that won’t interfere with the bigger pull request.

Are there any temporary fixes we can utilize in the meantime?

Hi,

thanks for reporting the issue on GitHub. Let’s discuss it there!

(we prefer the forum for questions and GitHub for bugs)