So I followed the documentation for dataset loading from script as closely as I could. I’ve got a dataset that loads from compressed numpy files (npz) into Array2D features ultimately output as PyTorch tensors.
I can run the dataset test & metadata generation just fine, but then when I actually try to load the dataset using:
ds = datasets.load_dataset('./asl_embeddings/', "default")
I get a yaml exception deep in the code.
File "~/project/dataset_test.py", line 3, in <module>
ds = datasets.load_dataset('./asl_embeddings/', "default")
File "~/project/venv/lib/python3.9/site-packages/datasets/load.py", line 2128, in load_dataset
builder_instance = load_dataset_builder(
File "~/project/venv/lib/python3.9/site-packages/datasets/load.py", line 1851, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
File "~/project/venv/lib/python3.9/site-packages/datasets/builder.py", line 383, in __init__
info = self.get_exported_dataset_info()
File "~/project/venv/lib/python3.9/site-packages/datasets/builder.py", line 507, in get_exported_dataset_info
return self.get_all_exported_dataset_infos().get(self.config.name, DatasetInfo())
File "~/project/venv/lib/python3.9/site-packages/datasets/builder.py", line 493, in get_all_exported_dataset_infos
return DatasetInfosDict.from_directory(cls.get_imported_module_dir())
File "~/project/venv/lib/python3.9/site-packages/datasets/info.py", line 430, in from_directory
dataset_card_data = DatasetCard.load(Path(dataset_infos_dir) / "README.md").data
File "~/project/venv/lib/python3.9/site-packages/huggingface_hub/repocard.py", line 186, in load
return cls(f.read(), ignore_metadata_errors=ignore_metadata_errors)
File "~/project/venv/lib/python3.9/site-packages/huggingface_hub/repocard.py", line 77, in __init__
self.content = content
File "~/project/venv/lib/python3.9/site-packages/huggingface_hub/repocard.py", line 95, in content
data_dict = yaml.safe_load(yaml_block)
File "~/project/venv/lib/python3.9/site-packages/yaml/__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "~/project/venv/lib/python3.9/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 60, in construct_document
for dummy in generator:
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
value = self.construct_mapping(node)
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 218, in construct_mapping
return super().construct_mapping(node, deep=deep)
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 143, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object
data = constructor(self, node)
File "~/project/venv/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined
raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
in "<unicode string>", line 10, column 16:
shape: !!python/tuple
It seems to be choking on the metadata declaring the 2d array, but I don’t understand the nitty-gritty enough to grok it. Any thoughts on what I’m doing wrong?