Dataset viewer doesn't work if split != "train"

I’m trying to upload a dataset to the hub. My dataset consists of an test set only, there’s no training set. However, when I try to specify a split name other than “train” in the dataset card (e.g. “test”), I get the error “The dataset viewer is not available for this split.”:

1 Like

The YAML section at the top of README.md serves as the configuration file, so configuring it there might make it work.

Thanks for the reply! As far as I can tell, I am following the documentation in the page you linked. Here’s the dataset page: roemmele/SceneIllustrations · Datasets at Hugging Face. In the metadata section of README.md, I’m specifying “name: test” under “splits”. I don’t have anything in the metadata referencing “train”, so “train” should not be expected as a data split.

1 Like

Hmm… This part?

---
dataset_info:
  features:
    - name: id
      dtype: string
    - name: phase
      dtype: int64
    - name: story
      dtype: string
    - name: position
      dtype: int64
    - name: fragment
      dtype: string
    - name: fragment_id
      dtype: string
    - name: illustration1
      dtype: image
    - name: illustration2
      dtype: image
    - name: annotator_selections
      sequence: string           # was: list: string  ← fix
...
...
---

Or simply removing config for features:

---
license: cc-by-nc-sa-4.0
language: en
task_categories:
  - text-to-image
pretty_name: SceneIllustrations
configs:
  - config_name: default
    data_files:
      - split: test
        path: "data/data_chunk_*.parquet"   # use the real extension: .parquet / .jsonl / .csv
---

Yes, this is the part that references the “test” split:

  splits:
    - name: test
      num_bytes: 6633552671
      num_examples: 2990
  configs:
    - config_name: default
      data_files:
        - split: test
          path: data/data_chunk_*