Dataset viewer not showing subsets?

Hi,

I’m encountering an issue I can’t find the solution to. I have created a dataset ( iguanodon-ai/kubhist2 · Datasets at Hugging Face ).

It has one split (train) but several configs, and for doing so I have followed the documentation here: Structure your repository

  - config_name: '1710'
    data_files:
      - split: train
        path: data/1710/train/*.parquet
  - config_name: '1720'
    data_files:
      - split: train
        path: data/1720/train/*.parquet

The structure is the same as this HF-released dataset ( HuggingFaceFW/finepdfs at main ).

The problem I’m having is that on my dataset the viewer shows only the split and not the subset:

but on the HF dataset, the viewer does show all the subsets:

What is going on? Can someone please enlighten me? Is it because I’m uploading parquet files and the automated parquet-converter tries to change the data?

Thanks!

1 Like

Oh. Seems working now (maybe by your own commit).

Hi! Thanks but I don’t see it changed – the viewer only shows one split (train), and not the subsets (1640, 1650, 1660, etc.). Can you please provide a screenshot of why you see?

1 Like

the viewer only shows one split (train), and not the subsets (1640, 1650, 1660, etc.)

sure…

1 Like

Thanks. Unfortunately that’s the issue I don´t understand. I have the same exact config as HuggingFaceFW/finepdfs · Datasets at Hugging Face (in the README.md) and yet the result is different from them.

1 Like

How about this…?


Why it still shows one subset: your README YAML is using metadata: dataset_info: rather than the viewer’s configs: schema. The Hub viewer reads configs: for subsets. Your README currently lists decades under metadata → dataset_info, plus an all config marked default: true. That does not populate the viewer’s subset dropdown. (Hugging Face)

Fix precisely:

  1. Put a YAML front-matter block at the very top of README.md using configs:.
  2. Keep your all entry if you want it as default, but list every decade under configs: too.
  3. Remove the metadata: wrapper. Optional: keep separate dataset_info: if you want, but it does not drive the viewer.

Minimal example to paste at the very top of README.md:

---
configs:
  - config_name: "1640"
    data_files:
      - split: train
        path: data/1640/train/*.parquet
  - config_name: "1650"
    data_files:
      - split: train
        path: data/1650/train/*.parquet
  # …repeat for all decades…
  - config_name: "all"
    default: true
    data_files:
      - split: train
        path: data/all/train/*.parquet
---

Reference syntax and behavior are defined in the Hub docs: use configs: with data_files for manual subsets; this is distinct from the metadata block. (Hugging Face)

If you commit that change to main, the viewer should show a “Subset” dropdown with all decades.

2 Likes

That did it! Thank you very much John!

Looks like I failed to edit the heading with migrating from the older dataset_info to the newer configs, my bad.

2 Likes