I’m trying to upload a dataset to the hub. My dataset consists of an test set only, there’s no training set. However, when I try to specify a split name other than “train” in the dataset card (e.g. “test”), I get the error “The dataset viewer is not available for this split.”:
1 Like
The YAML
section at the top of README.md
serves as the configuration file, so configuring it there might make it work.
Thanks for the reply! As far as I can tell, I am following the documentation in the page you linked. Here’s the dataset page: roemmele/SceneIllustrations · Datasets at Hugging Face. In the metadata section of README.md, I’m specifying “name: test” under “splits”. I don’t have anything in the metadata referencing “train”, so “train” should not be expected as a data split.
1 Like
Hmm… This part?
---
dataset_info:
features:
- name: id
dtype: string
- name: phase
dtype: int64
- name: story
dtype: string
- name: position
dtype: int64
- name: fragment
dtype: string
- name: fragment_id
dtype: string
- name: illustration1
dtype: image
- name: illustration2
dtype: image
- name: annotator_selections
sequence: string # was: list: string ← fix
...
...
---
Or simply removing config for features:
---
license: cc-by-nc-sa-4.0
language: en
task_categories:
- text-to-image
pretty_name: SceneIllustrations
configs:
- config_name: default
data_files:
- split: test
path: "data/data_chunk_*.parquet" # use the real extension: .parquet / .jsonl / .csv
---
Yes, this is the part that references the “test” split:
splits:
- name: test
num_bytes: 6633552671
num_examples: 2990
configs:
- config_name: default
data_files:
- split: test
path: data/data_chunk_*