I have a 3-level nested DatasetDict for the MultiDoGo datasets (paper splits, domain, train/dev/test) that I am trying to upload on the Hub as a community dataset:
When I am testing the downloading afterwards, I get:
KeyError: βField βbuilder_nameβ does not exist in table schemaβ
Seems like something is not right in the dataset_dict.jsonβs fieldsβ¦ How can I solve this issue?
I have encountered a similar issue recently. I observed that the schema was not exactly the same among all the files in the dataset and because of this, load_dataset() was failing. So my guess is that most probably one of your files might not have the field βbuilder_nameβ.
Hi ! Can you post the full stack trace of the error ? This could help debugging your issue.
Also note that the support for multi-configurations datasets is still WIP (see documentation here), so load_dataset currently merges all your train sets together (and same for test and dev).
Iβve just run into the same error, hereβs my stack trace:
>>> from datasets import load_dataset
>>> data = load_dataset('.')
Using custom data configuration .-418a6ac4a70df3d8
Downloading and preparing dataset json/. to /home/dave/.cache/huggingface/datasets/json/.-418a6ac4a70df3d8/0.0.0/c90812beea906fcffe0d5e3bb9eba909a80a998b5f88e9f8acbd320aa91acfde...
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 13414.62it/s]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 1955.69it/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dave/.local/lib/python3.8/site-packages/datasets/load.py", line 1694, in load_dataset
builder_instance.download_and_prepare(
File "/home/dave/.local/lib/python3.8/site-packages/datasets/builder.py", line 595, in download_and_prepare
self._download_and_prepare(
File "/home/dave/.local/lib/python3.8/site-packages/datasets/builder.py", line 683, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/dave/.local/lib/python3.8/site-packages/datasets/builder.py", line 1138, in _prepare_split
writer.write_table(table)
File "/home/dave/.local/lib/python3.8/site-packages/datasets/arrow_writer.py", line 473, in write_table
pa_table = pa.Table.from_arrays([pa_table[name] for name in self._schema.names], schema=self._schema)
File "/home/dave/.local/lib/python3.8/site-packages/datasets/arrow_writer.py", line 473, in <listcomp>
pa_table = pa.Table.from_arrays([pa_table[name] for name in self._schema.names], schema=self._schema)
File "pyarrow/table.pxi", line 1339, in pyarrow.lib.Table.__getitem__
File "pyarrow/table.pxi", line 1900, in pyarrow.lib.Table.column
File "pyarrow/table.pxi", line 1875, in pyarrow.lib.Table._ensure_integer_index
KeyError: 'Field "builder_name" does not exist in table schema'