Hi! I’m new to Huggingface and am encountering a problem for generating audio dataset preview.
I doesn’t know why the audio datasets preview contains many NULL values in the messages, and doesn’t show the audio file preview. I have read the docs and also read some posts, but it seems like I cannot find any problem same as mine.
Here’s the error.
Error code: DatasetGenerationError
Exception: ArrowInvalid
Message: JSON parse error: Column(/messages/[]/content/[]/text) changed from string to number in row 67
Traceback: Traceback (most recent call last):
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 160, in _generate_tables
df = pandas_read_json(f)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 38, in pandas_read_json
return pd.read_json(path_or_buf, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 815, in read_json
return json_reader.read()
File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1025, in read
obj = self._get_object_parser(self.data)
File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1187, in parse
self._parse()
File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1403, in _parse
ujson_loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1854, in _prepare_split_single
for _, table in generator:
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 163, in _generate_tables
raise e
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 137, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Column(/messages/[]/content/[]/text) changed from string to number in row 67
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1417, in compute_config_parquet_and_info_response
parquet_operations = convert_to_parquet(builder)
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1049, in convert_to_parquet
builder.download_and_prepare(
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1000, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1741, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1897, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
Here’s the link of the dataset. I would really appreciate someone take a look and help me, and thanks in advance!