I’m trying to upload a dataset to the hub. My dataset consists of an test set only, there’s no training set. However, when I try to specify a split name other than “train” in the dataset card (e.g. “test”), I get the error “The dataset viewer is not available for this split.”:
The YAML section at the top of README.md serves as the configuration file, so configuring it there might make it work.
Thanks for the reply! As far as I can tell, I am following the documentation in the page you linked. Here’s the dataset page: roemmele/SceneIllustrations · Datasets at Hugging Face. In the metadata section of README.md, I’m specifying “name: test” under “splits”. I don’t have anything in the metadata referencing “train”, so “train” should not be expected as a data split.
Hmm… This part?
---
dataset_info:
features:
- name: id
dtype: string
- name: phase
dtype: int64
- name: story
dtype: string
- name: position
dtype: int64
- name: fragment
dtype: string
- name: fragment_id
dtype: string
- name: illustration1
dtype: image
- name: illustration2
dtype: image
- name: annotator_selections
sequence: string # was: list: string ← fix
...
...
---
Or simply removing config for features:
---
license: cc-by-nc-sa-4.0
language: en
task_categories:
- text-to-image
pretty_name: SceneIllustrations
configs:
- config_name: default
data_files:
- split: test
path: "data/data_chunk_*.parquet" # use the real extension: .parquet / .jsonl / .csv
---
Yes, this is the part that references the “test” split:
splits:
- name: test
num_bytes: 6633552671
num_examples: 2990
configs:
- config_name: default
data_files:
- split: test
path: data/data_chunk_*
Seems path is slightly wrong. Try this also.
splits:
- name: test
num_bytes: 6633552671
num_examples: 2990
configs:
- config_name: default
data_files:
- split: test
path:
- data_chunk_*.parquet
- data/data_chunk_*.parquet
Ah, thanks for pointing out the issue with the path. I’ve fixed that, but still getting the same “Bad split” error.
DatasetViewer GUI doesn’t seem to work unless a train split exists (even a dummy one). This should make it work. Also, if list don’t work in Features, try using sequence.
splits:
- name: train
num_examples: 2990
num_bytes: 6633552671
- name: test
num_examples: 2990
num_bytes: 6633552671
configs:
- config_name: default
data_files:
- split: train
path: data_chunk_*.parquet
- split: test
path: data_chunk_*.parquet
Thanks, but even with that config, the dataset viewer still fails to display the testsplit (same “bad split” error). I’m thinking this is an issue that should be addressed in a future code update to the dataset viewer - do you know where to file this issue?
the dataset viewer still fails to display the
testsplit (same “bad split” error)
Seems so…
do you know where to file this issue?
Here.
