Dataset.map with None lists

I’m doing some transformations over a dataset with a labels column where some values are None but after the first .map transformation over a new field, the None values are converted into empty lists.

It’s a normal behaviour? How can I preserve the None values?

Thanks in advance!

Hi ! yes this is a known bug, see `None` replaced by `[]` after first batch in map · Issue #3676 · huggingface/datasets · GitHub

This can be fixed once Apache Arrow has the feature we requested here: [ARROW-15839] [C++][Python] Allow to reconstruct a ListArray with ListArray.from_arrays and keep the nulls - ASF JIRA

Feel free to post a message/vote for this issue on Arrow’s JIRA to express your need, this can probably help the Arrow team to prioritize this.

In the meantime we’re looking at workarounds to fix this, I’ll let you know what we come up with

Thanks @lhoestq for response.

I didn’t find the issue in the repo. I’ve voted the issue in pyarrow jira.

Thanks for letting me know about a workaround when ready.