I have a new language identification dataset that I want to upload to the HF Hub. However, I’m not finding any good information on how to create an additional dropdown for selecting the language variant.
The final dataset page should have two dropdowns, one for the language and one for the train/test/validation split (each language has its own train/test/val split).
Look at OSCAR they include a “Subset” field. Does anyone have a tutorial on how to perform this correctly?
You can achieve this by having one config for each language. Info on how to define configs is available here.
Thank you, Mario. I saw that page once, but I wrongly neglected its content since it talked about Filestructuring, but the API appears that will solve my issue. When I have some spare time I will try your fix and mark it as a solution.
Thank you, Mario, for your answer. I believe there is plenty of space for HuggingFace to improve their documentation on this step. I was forced to use not your link, but actually “dataset_scripts”
I will soon share my experience in a Medium Post to detail this subject. Yet, it was necessary to work at the file level. I initially intended to avoid creating intermediate files, and it is required to work at the Git/HF Repository level to achieve the full potential of this feature.