Iβm trying to load the common voice 17 HU dataset, but I always get an error when it gets to generating examples, the error:
Extracting data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:00<00:00, 114.46it/s]
Reading metadataβ¦: 37140it [00:00, 133407.71it/s]es/s]
Generating train split: 0 examples [00:00, ? examples/s]Traceback (most recent call last):
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\builder.pyβ, line 1627, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\features\features.pyβ, line 1813, in encode_example
return encode_nested_example(self, example)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\features\features.pyβ, line 1212, in encode_nested_example
{
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\utils\py_utils.pyβ, line 302, in zip_dict
yield key, tuple(d[key] for d in dicts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\utils\py_utils.pyβ, line 302, in
yield key, tuple(d[key] for d in dicts)
~^^^^^
KeyError: βsentence_idβ
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File βE:\torch-directml\dl_dataset.pyβ, line 3, in
train = load_dataset(βmozilla-foundation/common_voice_17_0β, βhuβ, use_auth_token=ββ)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\load.pyβ, line 1791, in load_dataset
builder_instance.download_and_prepare(
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\builder.pyβ, line 891, in download_and_prepare
self._download_and_prepare(
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\builder.pyβ, line 1651, in _download_and_prepare
super()._download_and_prepare(
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\builder.pyβ, line 986, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\builder.pyβ, line 1490, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File βC:\Users\tothg\AppData\Local\Programs\Python\Python312\Lib\site-packages\datasets\builder.pyβ, line 1646, in _prepare_split_single
raise DatasetGenerationError(βAn error occurred while generating the datasetβ) from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
The code is just:
from datasets import load_dataset
train = load_dataset(βmozilla-foundation/common_voice_17_0β, βhuβ, use_auth_token=ββ)