Dataset viewer doesn't work if split != "train"

I’m trying to upload a dataset to the hub. My dataset consists of an test set only, there’s no training set. However, when I try to specify a split name other than “train” in the dataset card (e.g. “test”), I get the error “The dataset viewer is not available for this split.”:

1 Like

The YAML section at the top of README.md serves as the configuration file, so configuring it there might make it work.

Thanks for the reply! As far as I can tell, I am following the documentation in the page you linked. Here’s the dataset page: roemmele/SceneIllustrations · Datasets at Hugging Face. In the metadata section of README.md, I’m specifying “name: test” under “splits”. I don’t have anything in the metadata referencing “train”, so “train” should not be expected as a data split.

1 Like

Hmm… This part?

---
dataset_info:
  features:
    - name: id
      dtype: string
    - name: phase
      dtype: int64
    - name: story
      dtype: string
    - name: position
      dtype: int64
    - name: fragment
      dtype: string
    - name: fragment_id
      dtype: string
    - name: illustration1
      dtype: image
    - name: illustration2
      dtype: image
    - name: annotator_selections
      sequence: string           # was: list: string  ← fix
...
...
---

Or simply removing config for features:

---
license: cc-by-nc-sa-4.0
language: en
task_categories:
  - text-to-image
pretty_name: SceneIllustrations
configs:
  - config_name: default
    data_files:
      - split: test
        path: "data/data_chunk_*.parquet"   # use the real extension: .parquet / .jsonl / .csv
---

Yes, this is the part that references the “test” split:

  splits:
    - name: test
      num_bytes: 6633552671
      num_examples: 2990
  configs:
    - config_name: default
      data_files:
        - split: test
          path: data/data_chunk_*
1 Like

Seems path is slightly wrong. Try this also.

  splits:
    - name: test
      num_bytes: 6633552671
      num_examples: 2990
  configs:
    - config_name: default
      data_files:
        - split: test
          path:
            - data_chunk_*.parquet
            - data/data_chunk_*.parquet

Ah, thanks for pointing out the issue with the path. I’ve fixed that, but still getting the same “Bad split” error.

1 Like

DatasetViewer GUI doesn’t seem to work unless a train split exists (even a dummy one). This should make it work. Also, if list don’t work in Features, try using sequence.

  splits:
    - name: train
      num_examples: 2990
      num_bytes: 6633552671
    - name: test
      num_examples: 2990
      num_bytes: 6633552671
  configs:
    - config_name: default
      data_files:
        - split: train
          path: data_chunk_*.parquet
        - split: test
          path: data_chunk_*.parquet

Thanks, but even with that config, the dataset viewer still fails to display the testsplit (same “bad split” error). I’m thinking this is an issue that should be addressed in a future code update to the dataset viewer - do you know where to file this issue?

1 Like

the dataset viewer still fails to display the testsplit (same “bad split” error)

Seems so…

do you know where to file this issue?

Here.

1 Like