Enabling dataset viewer by coexistence of loading script and parquet files

Hi, thank you for developing huggingface datasets!

I understand that the dataset viewer is disabled for datasets using the loading script for security reasons.

As the title says, is there any way to enable the dataset viewer by uploading the loading script and parquet files together myself? In some datasets I have created (e.g., shunk031/JGLUE · Datasets at Hugging Face), it seems that the dataset viewer is enabled when the loading script is in the main branch and the parquet files are in the refs/convert/parquet branch. Can I enable the viewer by creating parquet files from the loading script and pushing them directly to the refs/convert/parquet branch?

Hi ! No, pushing Parquet files to the refs/convert/parquet branch won’t enable the Viewer.

Instead I would recommend you to remove the loading script and push Parquet files to the main branch instead.

Since your dataset is made of multiple subsets, you can add a few lines of YAML to define which subset is made of which files, see Viewer documentation

Thank you for your kind reply!

Do you know if there is a way for loading script and parquet to coexist? If I upload a formatted version of the data that is published by the author of the paper with my loading script, I would like to keep the formatting procedure as well.

In that case we recommend to have the Parquet files in main, and create a new branch script that would contain the original data and the loading script.

This way, this would use the Parquet files and would be used for the Viewer:

load_dataset("shunk031/JGLUE")

And this can be useful for those who want to use the loading script and the original data

load_dataset("shunk031/JGLUE", revision="script")

Feel free to mention the existence of this branch in the dataset README.md

1 Like

I understand I can simply push the parquet file to the main branch and the script to another branch.

Thank you for your help!

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.