Error, dataset could not be generated

In file “/hface1/env/lib/python3.11/site-packages/datasets/builder.py”, line 1746, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File “/hface1/env/lib/python3.11/site-packages/datasets/builder.py”, line 1891, in _prepare_split_single
raise DatasetGenerationError(“An error occurred while generating the dataset”) from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
What to do next , the data is sownloaded but there’s a bug in builder.py to generate training set?

Hi! Can you share the entire error stack trace? Also, feel free to share a link to the problematic dataset if it’s public.

Tried again, same problem.
Tried different datasets like the-stack-dedup and the-stack .
After spending past 5 days downloading and extracting I get a bug:

Extracting data files: 100%|███████████████| 1/1 [03:32<00:00, 212.39s/it]
Traceback (most recent call last):
File “/media/env/lib/python3.11/site-packages/datasets/builder.py”, line 1858, in _prepare_split_single
for _, table in generator:
File “/media/env/lib/python3.11/site-packages/datasets/packaged_modules/parquet/parquet.py”, line 67, in _generate_tables
parquet_file = pq.ParquetFile(f)
^^^^^^^^^^^^^^^^^
File “/media/env/lib/python3.11/site-packages/pyarrow/parquet/core.py”, line 334, in init
self.reader.open(
File “pyarrow/_parquet.pyx”, line 1220, in pyarrow._parquet.ParquetReader.open
File “pyarrow/error.pxi”, line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “”, line 1, in
File “/media/env/lib/python3.11/site-packages/datasets/load.py”, line 1797, in load_dataset
builder_instance.download_and_prepare(
File “/media/env/lib/python3.11/site-packages/datasets/builder.py”, line 890, in download_and_prepare
self._download_and_prepare(
File “/media/env/lib/python3.11/site-packages/datasets/builder.py”, line 985, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File “/media/env/lib/python3.11/site-packages/datasets/builder.py”, line 1746, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File “/media/env/lib/python3.11/site-packages/datasets/builder.py”, line 1891, in _prepare_split_single
raise DatasetGenerationError(“An error occurred while generating the dataset”) from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset