Dataset viewer issue

I’m creating a dataset, it contains a lot of images in parquet (column image, dict = {“bytes”: …, “path”: …"}. When It was 90k of rows I can see images in the viewer. But now it shows these byte strings (TRUNCATED) and even in the viewer it says “null”, but I can open my parquets and easily read images, I don’t understand this behaviour, I haven’t changed settings of dataset, just uploaded more parquets and my images dissapeared from the viewer. And the rows calculation also shows wrong number of rows, it should be approximately 150,000.
image

After several updates it completely gone. It’s just showing me text " The dataset viewer should be available soon. Please retry later", but I was waiting for 3 hours, nothing happened.

Hi! wan you share the dataset URL? or open a discussion on the repo and ping me (@severo)? Thanks!

Hey, do you have access to private repositories? I ping you there

no, we can’t access the private repos.

1 Like

Hey! I’ve finally opened dataset and pinged you in dataset repository, can you check what is wrong with my viewer?

1 Like

cc @lhoestq

1 Like

Hello! It still doesn’t work, even after suggested fixes, now it shows the error below for almost every subset. Can you help me with this? @severo @lhoestq

Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code:   StreamingRowsError
Exception:    ArrowNotImplementedError
Message:      Unsupported cast from list<item: struct<name: string, sex: string, colors: list<element: string>, styles: list<element: string>, materials: list<element: string>, length: string, fit: string>> to struct using function cast_struct
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 99, in get_rows_or_raise
                  return get_rows(
                File "/src/libs/libcommon/src/libcommon/utils.py", line 197, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 77, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 2093, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 279, in __iter__
                  for key, pa_table in self.generate_tables_fn(**gen_kwags):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/parquet/parquet.py", line 93, in _generate_tables
                  yield f"{file_idx}_{batch_idx}", self._cast_table(pa_table)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/parquet/parquet.py", line 71, in _cast_table
                  pa_table = table_cast(pa_table, self.info.features.arrow_schema)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/table.py", line 2292, in table_cast
                  return cast_table_to_schema(table, schema)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/table.py", line 2252, in cast_table_to_schema
                  return pa.Table.from_arrays(arrays, schema=schema)
                File "pyarrow/table.pxi", line 3974, in pyarrow.lib.Table.from_arrays
                File "pyarrow/table.pxi", line 1464, in pyarrow.lib._sanitize_arrays
                File "pyarrow/array.pxi", line 370, in pyarrow.lib.asarray
                File "pyarrow/table.pxi", line 566, in pyarrow.lib.ChunkedArray.cast
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pyarrow/compute.py", line 404, in cast
                  return call_function("cast", [arr], options, memory_pool)
                File "pyarrow/_compute.pyx", line 590, in pyarrow._compute.call_function
                File "pyarrow/_compute.pyx", line 385, in pyarrow._compute.Function.call
                File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
                File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
              pyarrow.lib.ArrowNotImplementedError: Unsupported cast from list<item: struct<name: string, sex: string, colors: list<element: string>, styles: list<element: string>, materials: list<element: string>, length: string, fit: string>> to struct using function cast_struct

I think this happens because I store lists with dicts in multiple columns (maybe viewer can’t cast them), but I don’t understand why it worked before, I didn’t reupload data, fix was just about configs in README.

1 Like