I am trying to
push_to_hub to create a dataset composed of multiple subsets (e.g., “dataset_1”, “dataset_2”, etc) and, within each subset, different splits (e.g., “train”, “test”, “dev”) - like the GLUE dataset already available on the Hub.
Is there a way to do it?
Thanks a lot in advance for your help!
Hello and welcome to Forum
If you want your splits to be loaded programmatically, you can implement a dataset loading script like it’s done in GLUE.
Let me know if it helps
Hi ! we are working on this
push_to_hub you will be able to have several subsets, one per directory as defined in our documentation on how to structure your dataset repository (but with Parquet files instead of CSV)