Dataset viewer doesn't work if split != "train"

roemmele · September 26, 2025, 10:27pm

I’m trying to upload a dataset to the hub. My dataset consists of an test set only, there’s no training set. However, when I try to specify a split name other than “train” in the dataset card (e.g. “test”), I get the error “The dataset viewer is not available for this split.”:

John6666 · September 26, 2025, 10:36pm

The YAML section at the top of README.md serves as the configuration file, so configuring it there might make it work.

roemmele · September 26, 2025, 11:31pm

Thanks for the reply! As far as I can tell, I am following the documentation in the page you linked. Here’s the dataset page: roemmele/SceneIllustrations · Datasets at Hugging Face. In the metadata section of README.md, I’m specifying “name: test” under “splits”. I don’t have anything in the metadata referencing “train”, so “train” should not be expected as a data split.

John6666 · September 27, 2025, 2:11am

Hmm… This part?

---
dataset_info:
  features:
    - name: id
      dtype: string
    - name: phase
      dtype: int64
    - name: story
      dtype: string
    - name: position
      dtype: int64
    - name: fragment
      dtype: string
    - name: fragment_id
      dtype: string
    - name: illustration1
      dtype: image
    - name: illustration2
      dtype: image
    - name: annotator_selections
      sequence: string           # was: list: string  ← fix
...
...
---

Or simply removing config for features:

---
license: cc-by-nc-sa-4.0
language: en
task_categories:
  - text-to-image
pretty_name: SceneIllustrations
configs:
  - config_name: default
    data_files:
      - split: test
        path: "data/data_chunk_*.parquet"   # use the real extension: .parquet / .jsonl / .csv
---

roemmele · September 29, 2025, 6:03pm

Yes, this is the part that references the “test” split:

  splits:
    - name: test
      num_bytes: 6633552671
      num_examples: 2990
  configs:
    - config_name: default
      data_files:
        - split: test
          path: data/data_chunk_*

John6666 · September 29, 2025, 9:30pm

Seems path is slightly wrong. Try this also.

  splits:
    - name: test
      num_bytes: 6633552671
      num_examples: 2990
  configs:
    - config_name: default
      data_files:
        - split: test
          path:
            - data_chunk_*.parquet
            - data/data_chunk_*.parquet

roemmele · September 30, 2025, 6:14pm

Ah, thanks for pointing out the issue with the path. I’ve fixed that, but still getting the same “Bad split” error.

John6666 · September 30, 2025, 8:39pm

DatasetViewer GUI doesn’t seem to work unless a train split exists (even a dummy one). This should make it work. Also, if list don’t work in Features, try using sequence.

  splits:
    - name: train
      num_examples: 2990
      num_bytes: 6633552671
    - name: test
      num_examples: 2990
      num_bytes: 6633552671
  configs:
    - config_name: default
      data_files:
        - split: train
          path: data_chunk_*.parquet
        - split: test
          path: data_chunk_*.parquet

roemmele · October 1, 2025, 1:13am

Thanks, but even with that config, the dataset viewer still fails to display the testsplit (same “bad split” error). I’m thinking this is an issue that should be addressed in a future code update to the dataset viewer - do you know where to file this issue?

John6666 · October 1, 2025, 1:18am

the dataset viewer still fails to display the testsplit (same “bad split” error)

Seems so…

do you know where to file this issue?

Here.

Topic		Replies	Views
The Full Dataset Viewer is Not available, Only showing preview of rows 🤗Datasets	0	95	July 18, 2024
Dataset viewer not showing subsets? 🤗Hub	6	30	September 24, 2025
Dataset preview not showing for uploaded DatasetDict 🤗Datasets	6	2195	December 7, 2021
Download_and_extract() file missing, but only for one split 🤗Datasets	1	197	March 18, 2024
Ull dataset viewer is not available 🤗Datasets	4	221	January 8, 2025

Dataset viewer doesn't work if split != "train"

Related topics