Enabling dataset viewer by coexistence of loading script and parquet files

shunk031 · March 17, 2024, 1:22pm

Hi, thank you for developing huggingface datasets!

I understand that the dataset viewer is disabled for datasets using the loading script for security reasons.

As the title says, is there any way to enable the dataset viewer by uploading the loading script and parquet files together myself? In some datasets I have created (e.g., shunk031/JGLUE · Datasets at Hugging Face), it seems that the dataset viewer is enabled when the loading script is in the main branch and the parquet files are in the refs/convert/parquet branch. Can I enable the viewer by creating parquet files from the loading script and pushing them directly to the refs/convert/parquet branch?

lhoestq · March 18, 2024, 10:51am

Hi ! No, pushing Parquet files to the refs/convert/parquet branch won’t enable the Viewer.

Instead I would recommend you to remove the loading script and push Parquet files to the main branch instead.

Since your dataset is made of multiple subsets, you can add a few lines of YAML to define which subset is made of which files, see Viewer documentation

shunk031 · March 18, 2024, 11:23am

Thank you for your kind reply!

Do you know if there is a way for loading script and parquet to coexist? If I upload a formatted version of the data that is published by the author of the paper with my loading script, I would like to keep the formatting procedure as well.

lhoestq · March 18, 2024, 11:28am

In that case we recommend to have the Parquet files in main, and create a new branch script that would contain the original data and the loading script.

This way, this would use the Parquet files and would be used for the Viewer:

load_dataset("shunk031/JGLUE")

And this can be useful for those who want to use the loading script and the original data

load_dataset("shunk031/JGLUE", revision="script")

Feel free to mention the existence of this branch in the dataset README.md

shunk031 · March 18, 2024, 11:47am

I understand I can simply push the parquet file to the main branch and the script to another branch.

Thank you for your help!

system · March 18, 2024, 11:48pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dataset Viewer for dataset with downloadable data 🤗Datasets	3	26	March 6, 2025
Dataset viewer crashes after generating parquet files from convert_to_parquet 🤗Datasets	1	36	April 15, 2025
Dataset repo requires arbitrary Python code execution 🤗Datasets	21	2956	February 14, 2025
The Dataset Preview has been disabled on this dataset 🤗Datasets	8	3581	November 2, 2023
Ull dataset viewer is not available 🤗Datasets	4	124	January 8, 2025

Enabling dataset viewer by coexistence of loading script and parquet files

Related topics