`push_to_hub` a dataset dict with subsets and splits (e.g., GLUE)

Hi there,

I am trying to push_to_hub to create a dataset composed of multiple subsets (e.g., “dataset_1”, “dataset_2”, etc) and, within each subset, different splits (e.g., “train”, “test”, “dev”) - like the GLUE dataset already available on the Hub.

Is there a way to do it?

Thanks a lot in advance for your help!

Hello :wave: and welcome to Forum :hugs:

If you want your splits to be loaded programmatically, you can implement a dataset loading script like it’s done in GLUE.

Let me know if it helps :raised_hands:t2:

1 Like

Hi ! we are working on this :slight_smile:

Ultimately with push_to_hub you will be able to have several subsets, one per directory as defined in our documentation on how to structure your dataset repository (but with Parquet files instead of CSV)

1 Like