Got wrong row number of dataset viewer

animepfp · June 25, 2024, 1:15pm

I hope it helps.

I believe so. Just for people Googling in the future.

For a dataset repository to use the right data, you must consider:

You must have a configuration that is YAML instead a README markdown document that points to your training splits. The README.md document is not prepared by save_to_disk.
save_to_disk will create a dataset_info.json and state.json, but that doesn’t do anything as far as the UI is concerned.
The UI will ignore the file extension/files (.arrow) that are produced by save_to_diskand instead relies on a hierarchy of extensions to find while crawling the repository.

Do I have this correct? This was unexpected for myself, but if this is the way it works this is the way it works.

I have updated the README.md to reflect the arrow files, but it still reports the wrong number of rows:

Topic		Replies	Views
Dataset shows 0 rows when loaded but full when pushed 🤗Datasets	0	436	July 26, 2023
Ull dataset viewer is not available 🤗Datasets	4	225	January 8, 2025
Dataset preview not showing for uploaded DatasetDict 🤗Datasets	6	2198	December 7, 2021
The datasets num is not equal 🤗Datasets	0	15	May 15, 2025
The Full Dataset Viewer is Not available, Only showing preview of rows 🤗Datasets	0	96	July 18, 2024