TypeError: Couldn't cast array of type int64 to null

John6666 · February 6, 2025, 1:52pm

To begin with, Python’s language specification includes None but not null, so I think that null probably comes from pandas or pyarrow.
In pandas, for example, just having “” creates null. I think that such data is causing problems when implicit conversion is performed inside the datasets library. On the other hand, int64 is probably data that has been successfully tokenized.

github.com/huggingface/datasets

"Couldn't cast array of type" in complex datasets

opened 02:16PM - 19 Jun 23 UTC

closed 03:13PM - 26 Jul 23 UTC

piercefreeman

### Describe the bug When doing a map of a dataset with complex types, sometime…s `datasets` is unable to interpret the valid schema of a returned datasets.map() function. This often comes from conflicting types, like when both empty lists and filled lists are competing for the same field value. This is prone to happen in batch mapping, when the mapper returns a sequence of null/empty values and other batches are non-null. A workaround is to manually cast the new batch to a pyarrow table (like implemented in this [workaround](https://github.com/piercefreeman/lassen/pull/3)) but it feels like this ideally should be solved at the core library level. Note that the reproduction case only throws this error if the first datapoint has the empty list. If it is processed later, datasets already detects its representation as list-type and therefore allows the empty list to be provided. ### Steps to reproduce the bug A trivial reproduction case: ```python from typing import Iterator, Any import pandas as pd from datasets import Dataset def batch_to_examples(batch: dict[str, list[Any]]) -> Iterator[dict[str, Any]]: for i in range(next(iter(lengths))): yield {feature: values[i] for feature, values in batch.items()} def examples_to_batch(examples) -> dict[str, list[Any]]: batch = {} for example in examples: for feature, value in example.items(): if feature not in batch: batch[feature] = [] batch[feature].append(value) return batch def batch_process(examples, explicit_schema: bool): new_examples = [] for example in batch_to_examples(examples): new_examples.append(dict(texts=example["raw_text"].split())) return examples_to_batch(new_examples) df = pd.DataFrame( [ {"raw_text": ""}, {"raw_text": "This is a test"}, {"raw_text": "This is another test"}, ] ) dataset = Dataset.from_pandas(df) # datasets won't be able to typehint a dataset that starts with an empty example. with pytest.raises(TypeError, match="Couldn't cast array of type"): dataset = dataset.map( batch_process, batched=True, batch_size=1, num_proc=1, remove_columns=dataset.column_names, ) ``` This results in crashes like: ```bash File "/Users/piercefreeman/Library/Caches/pypoetry/virtualenvs/example-9kBqeSPy-py3.11/lib/python3.11/site-packages/datasets/table.py", line 1819, in wrapper return func(array, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/piercefreeman/Library/Caches/pypoetry/virtualenvs/example-9kBqeSPy-py3.11/lib/python3.11/site-packages/datasets/table.py", line 2109, in cast_array_to_feature return array_cast(array, feature(), allow_number_to_str=allow_number_to_str) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/piercefreeman/Library/Caches/pypoetry/virtualenvs/example-9kBqeSPy-py3.11/lib/python3.11/site-packages/datasets/table.py", line 1819, in wrapper return func(array, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/piercefreeman/Library/Caches/pypoetry/virtualenvs/example-9kBqeSPy-py3.11/lib/python3.11/site-packages/datasets/table.py", line 1998, in array_cast raise TypeError(f"Couldn't cast array of type {array.type} to {pa_type}") TypeError: Couldn't cast array of type string to null ``` ### Expected behavior The code should successfully map and create a new dataset without error. ### Environment info Mac OSX, Linux

Topic		Replies	Views
TypeError: Couldn't cast array of type int64 while mapping the dataset 🤗Datasets	6	5708	March 22, 2023
Strange Error While Attempting to Load DataSet 🤗Datasets	7	3632	March 28, 2025
Dataset map() raises value error when mapping list to dict-like class 🤗Datasets	6	106	August 15, 2024
Dataset.map returns error: pyarrow.lib.ArrowInvalid: cannot mix list and non-list, non-null values 🤗Datasets	1	1587	January 17, 2025
multiprocess.pool.RemoteTraceback and TypeError: Couldn't cast array of type string to null when loading Hugging Face dataset Beginners	1	136	September 24, 2024

TypeError: Couldn't cast array of type int64 to null

Related topics