Iβve got several pandas dataframes saved to csv files. Iβm trying to create a single Dataset
object by loading them with load_dataset()
:
my_ds = load_dataset('/path/to/data_dir')
I havenβt explicitly checked, but Iβm pretty certain all the labels in the label column are strings. Whenever I try to load the dataset, I get the following error:
pyarrow.lib.ArrowInvalid: Failed to parse string: 'a0d6fb' as a scalar of type int64
Here is the full traceback:
Resolving data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 447/447 [00:00<00:00, 111898.17it/s]
Using custom data configuration pd_data_test-d4ecbb8864e740ad
Downloading and preparing dataset csv/pd_data_test to /home/aclifton/.cache/huggingface/datasets/csv/pd_data_test-d4ecbb8864e740ad/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519...
Downloading data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 82.99it/s]
Extracting data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 7.33it/s]
Traceback (most recent call last):
File "/home/aclifton/rf_fp/gather_files.py", line 71, in <module>
my_ds = load_dataset('/home/aclifton/rf_fp/pd_data_test')
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/load.py", line 1691, in load_dataset
builder_instance.download_and_prepare(
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 605, in download_and_prepare
self._download_and_prepare(
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 694, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 1154, in _prepare_split
writer.write_table(table)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/arrow_writer.py", line 523, in write_table
pa_table = table_cast(pa_table, self._schema)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1860, in table_cast
return cast_table_to_schema(table, schema)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1843, in cast_table_to_schema
arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()]
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1843, in <listcomp>
arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()]
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1672, in wrapper
return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks])
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1672, in <listcomp>
return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks])
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1808, in cast_array_to_feature
return array_cast(array, feature(), allow_number_to_str=allow_number_to_str)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1674, in wrapper
return func(array, *args, **kwargs)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/table.py", line 1741, in array_cast
return array.cast(pa_type)
File "pyarrow/array.pxi", line 826, in pyarrow.lib.Array.cast
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/pyarrow/compute.py", line 375, in cast
return call_function("cast", [arr], options)
File "pyarrow/_compute.pyx", line 531, in pyarrow._compute.call_function
File "pyarrow/_compute.pyx", line 330, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Failed to parse string: 'a0d6fb' as a scalar of type int64
Any ideas about what might be going on? Thanks in advance for your help!