Dataset viewer not showing subsets?

drvenabili · September 24, 2025, 12:04pm

Hi,

I’m encountering an issue I can’t find the solution to. I have created a dataset ( iguanodon-ai/kubhist2 · Datasets at Hugging Face ).

It has one split (train) but several configs, and for doing so I have followed the documentation here: Structure your repository

  - config_name: '1710'
    data_files:
      - split: train
        path: data/1710/train/*.parquet
  - config_name: '1720'
    data_files:
      - split: train
        path: data/1720/train/*.parquet

The structure is the same as this HF-released dataset ( HuggingFaceFW/finepdfs at main ).

The problem I’m having is that on my dataset the viewer shows only the split and not the subset:

but on the HF dataset, the viewer does show all the subsets:

What is going on? Can someone please enlighten me? Is it because I’m uploading parquet files and the automated parquet-converter tries to change the data?

Thanks!

John6666 · September 24, 2025, 12:26pm

Oh. Seems working now (maybe by your own commit).

drvenabili · September 24, 2025, 12:28pm

Hi! Thanks but I don’t see it changed – the viewer only shows one split (train), and not the subsets (1640, 1650, 1660, etc.). Can you please provide a screenshot of why you see?

John6666 · September 24, 2025, 12:31pm

the viewer only shows one split (train), and not the subsets (1640, 1650, 1660, etc.)

sure…

drvenabili · September 24, 2025, 12:32pm

Thanks. Unfortunately that’s the issue I don´t understand. I have the same exact config as HuggingFaceFW/finepdfs · Datasets at Hugging Face (in the README.md) and yet the result is different from them.

John6666 · September 24, 2025, 12:34pm

How about this…?

Why it still shows one subset: your README YAML is using metadata: dataset_info: rather than the viewer’s configs: schema. The Hub viewer reads configs: for subsets. Your README currently lists decades under metadata → dataset_info, plus an all config marked default: true. That does not populate the viewer’s subset dropdown. (Hugging Face)

Fix precisely:

Put a YAML front-matter block at the very top of README.md using configs:.
Keep your all entry if you want it as default, but list every decade under configs: too.
Remove the metadata: wrapper. Optional: keep separate dataset_info: if you want, but it does not drive the viewer.

Minimal example to paste at the very top of README.md:

---
configs:
  - config_name: "1640"
    data_files:
      - split: train
        path: data/1640/train/*.parquet
  - config_name: "1650"
    data_files:
      - split: train
        path: data/1650/train/*.parquet
  # …repeat for all decades…
  - config_name: "all"
    default: true
    data_files:
      - split: train
        path: data/all/train/*.parquet
---

Reference syntax and behavior are defined in the Hub docs: use configs: with data_files for manual subsets; this is distinct from the metadata block. (Hugging Face)

If you commit that change to main, the viewer should show a “Subset” dropdown with all decades.

drvenabili · September 24, 2025, 12:38pm

That did it! Thank you very much John!

Looks like I failed to edit the heading with migrating from the older dataset_info to the newer configs, my bad.

Topic		Replies	Views
Dataset viewer doesn't work if split != "train" 🤗Datasets	9	58	October 1, 2025
How to configure the order of subsets in the dataset viewer 🤗Datasets	2	77	October 3, 2024
The Full Dataset Viewer is Not available, Only showing preview of rows 🤗Datasets	0	96	July 18, 2024
Datasets viewer preview only 🤗Datasets	3	72	April 24, 2025
Ull dataset viewer is not available 🤗Datasets	4	222	January 8, 2025

Dataset viewer not showing subsets?

Related topics