Hi, Iβve followed the documentation closely to write my dataset loading script. Loading dataset with datasets.load_dataset works fine and everything.
But when I attempted to generate a dataset metadata using the way as specified in this link, the following errors occurred:
β datasets-cli test jherng/xd-violence --save_info --all_configs
Loading Dataset Infos from C:\Users\Jia Herng\.cache\huggingface\modules\datasets_modules\datasets\jherng--xd-violence\364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e
Overwrite dataset info from restored data version if exists.
Loading Dataset info from C:\Users\Jia Herng\.cache\huggingface\datasets/jherng___xd-violence/video/0.0.0/364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e
Testing builder 'video' (1/4)
Generating dataset xd-violence (C:/Users/Jia Herng/.cache/huggingface/datasets/jherng___xd-violence/video/0.0.0/364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e)
Downloading and preparing dataset xd-violence/video (download: 79.64 GiB, generated: 929.76 KiB, post-processed: Unknown size, total: 79.64 GiB) to C:/Users/Jia Herng/.cache/huggingface/datasets/jherng___xd-violence/video/0.0.0/364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e...
Downloading took 0.0 min
Checksum Computation took 0.0 min
Downloading took 0.0 min
Checksum Computation took 0.0 min
Downloading took 0.0 min
Checksum Computation took 0.0 min
Downloading data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3950/3950 [00:03<00:00, 1192.69it/s]
Downloading took 0.0 min
Checksum Computation took 0.0 min
Downloading data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 800/800 [00:00<00:00, 1220.36it/s]
Downloading took 0.0 min
Checksum Computation took 0.0 min
Generating train split
Generating train split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3950/3950 [00:00<00:00, 21280.08 examples/s]
Generating test split
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 800/800 [00:00<00:00, 10255.68 examples/s]
Dataset xd-violence downloaded and prepared to C:/Users/Jia Herng/.cache/huggingface/datasets/jherng___xd-violence/video/0.0.0/364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e. Subsequent calls will reuse this data.
Loading Dataset Infos from C:\Users\Jia Herng\.cache\huggingface\modules\datasets_modules\datasets\jherng--xd-violence\364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e
Dataset card saved at C:\Users\Jia Herng\.cache\huggingface\modules\datasets_modules\datasets\jherng--xd-violence\364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e\README.md
Loading Dataset Infos from C:\Users\Jia Herng\.cache\huggingface\modules\datasets_modules\datasets\jherng--xd-violence\364a20a2942b6ff05e759ca668d2770b88448d3a7aaff11abb07ede7a7b56f8e
Traceback (most recent call last):
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\Scripts\datasets-cli.exe\__main__.py", line 7, in <module>
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\commands\datasets_cli.py", line 39, in main
service.run()
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\commands\test.py", line 141, in run
for j, builder in enumerate(get_builders()):
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\commands\test.py", line 124, in get_builders
yield builder_cls(
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\builder.py", line 383, in __init__
info = self.get_exported_dataset_info()
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\builder.py", line 507, in get_exported_dataset_info
return self.get_all_exported_dataset_infos().get(self.config.name, DatasetInfo())
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\builder.py", line 493, in get_all_exported_dataset_infos
return DatasetInfosDict.from_directory(cls.get_imported_module_dir())
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\datasets\info.py", line 430, in from_directory
dataset_card_data = DatasetCard.load(Path(dataset_infos_dir) / "README.md").data
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\huggingface_hub\repocard.py", line 186, in load
return cls(f.read(), ignore_metadata_errors=ignore_metadata_errors)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\huggingface_hub\repocard.py", line 77, in __init__
self.content = content
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\huggingface_hub\repocard.py", line 95, in content
data_dict = yaml.safe_load(yaml_block)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\__init__.py", line 81, in load
return loader.get_single_data()
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 60, in construct_document
for dummy in generator:
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 413, in construct_yaml_map
value = self.construct_mapping(node)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 218, in construct_mapping
return super().construct_mapping(node, deep=deep)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 143, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 100, in construct_object
data = constructor(self, node)
File "C:\Users\Jia Herng\miniconda3\envs\fyp-env\lib\site-packages\yaml\constructor.py", line 427, in construct_undefined
raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
in "<unicode string>", line 10, column 16:
shape: !!python/tuple
^
I suspect itβs due to the use of Array2D in the dataset, as it generates β!!python/tupleβ in the metadata file, and the underlying datasets implementation uses yaml.safe_load(), which then causes this error.
This is the half generated dataset metadata README.md:
---
dataset_info:
- config_name: i3d_rgb
features:
- name: id
dtype: string
- name: feature
dtype:
array2_d:
shape: !!python/tuple
- 2048
dtype: float32
- name: binary_target
dtype:
class_label:
names:
'0': Non-violence
'1': Violence
- name: multilabel_target
sequence:
class_label:
names:
'0': Normal
'1': Fighting
'2': Shooting
'3': Riot
'4': Abuse
'5': Car accident
'6': Explosion
- name: frame_annotations
sequence:
- name: start
dtype: int32
- name: end
dtype: int32
splits:
- name: train
num_bytes: 10535081525
num_examples: 19750
- name: test
num_bytes: 1512537525
num_examples: 4000
download_size: 12040668091
dataset_size: 12047619050
- config_name: video
features:
- name: id
dtype: string
- name: path
dtype: string
- name: binary_target
dtype:
class_label:
names:
'0': Non-violence
'1': Violence
- name: multilabel_target
sequence:
class_label:
names:
'0': Normal
'1': Fighting
'2': Shooting
'3': Riot
'4': Abuse
'5': Car accident
'6': Explosion
- name: frame_annotations
sequence:
- name: start
dtype: int32
- name: end
dtype: int32
splits:
- name: train
num_bytes: 782565
num_examples: 3950
- name: test
num_bytes: 169505
num_examples: 800
download_size: 85510639707
dataset_size: 952070
---
Appreciate any help!