[BUILDER_CONFIGS] Include Additional Dropdowns in Dataset Page

arubenruben · August 2, 2023, 11:42am

I have a new language identification dataset that I want to upload to the HF Hub. However, I’m not finding any good information on how to create an additional dropdown for selecting the language variant.

The final dataset page should have two dropdowns, one for the language and one for the train/test/validation split (each language has its own train/test/val split).

Look at OSCAR they include a “Subset” field. Does anyone have a tutorial on how to perform this correctly?

mariosasko · August 16, 2023, 3:11pm

You can achieve this by having one config for each language. Info on how to define configs is available here.

arubenruben · August 17, 2023, 11:11am

Thank you, Mario. I saw that page once, but I wrongly neglected its content since it talked about Filestructuring, but the API appears that will solve my issue. When I have some spare time I will try your fix and mark it as a solution.

arubenruben · August 25, 2023, 7:34pm

Thank you, Mario, for your answer. I believe there is plenty of space for HuggingFace to improve their documentation on this step. I was forced to use not your link, but actually “dataset_scripts”

I will soon share my experience in a Medium Post to detail this subject. Yet, it was necessary to work at the file level. I initially intended to avoid creating intermediate files, and it is required to work at the Git/HF Repository level to achieve the full potential of this feature.

Topic		Replies	Views
Dataset subsets with default Dataloader 🤗Datasets	2	322	October 25, 2022
Bug with datasets configs? 🤗Datasets	6	251	September 7, 2023
How do I get the dataset loader working with multiple versions? 🤗Datasets	4	1563	November 8, 2022
Get_dataset_config_names not getting desired output (and DatasetGenerationError) 🤗Datasets	5	92	December 11, 2024
How does Hugging Face Hub jointly versions models and their training data? 🤗Hub	5	868	January 13, 2023

[BUILDER_CONFIGS] Include Additional Dropdowns in Dataset Page

Related topics