Dataset preview rendering with NULL

Hi! I’m new to Huggingface and am encountering a problem for generating audio dataset preview.

I doesn’t know why the audio datasets preview contains many NULL values in the messages, and doesn’t show the audio file preview. I have read the docs and also read some posts, but it seems like I cannot find any problem same as mine.

Here’s the error.

Error code:   DatasetGenerationError
Exception:    ArrowInvalid
Message:      JSON parse error: Column(/messages/[]/content/[]/text) changed from string to number in row 67
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 160, in _generate_tables
                  df = pandas_read_json(f)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 38, in pandas_read_json
                  return pd.read_json(path_or_buf, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 815, in read_json
                  return json_reader.read()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1025, in read
                  obj = self._get_object_parser(self.data)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
                  obj = FrameParser(json, **kwargs).parse()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1187, in parse
                  self._parse()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1403, in _parse
                  ujson_loads(json, precise_float=self.precise_float), dtype=None
              ValueError: Trailing data
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1854, in _prepare_split_single
                  for _, table in generator:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 163, in _generate_tables
                  raise e
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 137, in _generate_tables
                  pa_table = paj.read_json(
                File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
                File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
                File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
              pyarrow.lib.ArrowInvalid: JSON parse error: Column(/messages/[]/content/[]/text) changed from string to number in row 67
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1417, in compute_config_parquet_and_info_response
                  parquet_operations = convert_to_parquet(builder)
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1049, in convert_to_parquet
                  builder.download_and_prepare(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 924, in download_and_prepare
                  self._download_and_prepare(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1000, in _download_and_prepare
                  self._prepare_split(split_generator, **prepare_split_kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1741, in _prepare_split
                  for job_id, done, content in self._prepare_split_single(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1897, in _prepare_split_single
                  raise DatasetGenerationError("An error occurred while generating the dataset") from e
              datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

Here’s the link of the dataset. I would really appreciate someone take a look and help me, and thanks in advance!

1 Like