@lhoestq One quick follow up. Suppose my data is now a numpy array of size 100x2, how should I define this feature data type in the loading script? Currently, I am getting this error:
Using custom data configuration default
Downloading and preparing dataset proto_data/default to /home/aclifton/.cache/huggingface/datasets/proto_data/default/0.0.0/e33b001c2bee045d8ad072bd018561ee193303716d8cdd062cefc3a83a8d655b...
Downloading data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5035.18it/s]
Extracting data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 1416.04it/s]
Traceback (most recent call last):
File "/home/aclifton/rf_fp/tmp.py", line 5, in <module>
ds = load_dataset('/RAID/users/aclifton/rffp_datasets/proto_data_top_25_labels_data')
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/load.py", line 1691, in load_dataset
builder_instance.download_and_prepare(
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 605, in download_and_prepare
self._download_and_prepare(
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 1104, in _download_and_prepare
super()._download_and_prepare(dl_manager, verify_infos, check_duplicate_keys=verify_infos)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 694, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/builder.py", line 1095, in _prepare_split
example = self.info.features.encode_example(record)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 1356, in encode_example
return encode_nested_example(self, example)
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 1007, in encode_nested_example
return {k: encode_nested_example(sub_schema, sub_obj) for k, (sub_schema, sub_obj) in zip_dict(schema, obj)}
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 1007, in <dictcomp>
return {k: encode_nested_example(sub_schema, sub_obj) for k, (sub_schema, sub_obj) in zip_dict(schema, obj)}
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 1047, in encode_nested_example
return [encode_nested_example(schema.feature, o) for o in obj]
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 1047, in <listcomp>
return [encode_nested_example(schema.feature, o) for o in obj]
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 1052, in encode_nested_example
return schema.encode_example(obj) if obj is not None else None
File "/home/aclifton/anaconda3/envs/rffp/lib/python3.9/site-packages/datasets/features/features.py", line 456, in encode_example
return float(value)
TypeError: only size-1 arrays can be converted to Python scalars