`push_to_hub` a dataset dict with subsets and splits (e.g., GLUE)

Hi there,

I am trying to push_to_hub to create a dataset composed of multiple subsets (e.g., “dataset_1”, “dataset_2”, etc) and, within each subset, different splits (e.g., “train”, “test”, “dev”) - like the GLUE dataset already available on the Hub.

Is there a way to do it?

Thanks a lot in advance for your help!

2 Likes

Hello :wave: and welcome to Forum :hugs:

If you want your splits to be loaded programmatically, you can implement a dataset loading script like it’s done in GLUE.

Let me know if it helps :raised_hands:t2:

2 Likes

Hi ! we are working on this :slight_smile:

Ultimately with push_to_hub you will be able to have several subsets, one per directory as defined in our documentation on how to structure your dataset repository (but with Parquet files instead of CSV)

2 Likes

Hi I have the same question is there a way to do this?

You can now use push_to_hub to push multiple subsets of your dataset ! e.g.

dataset_subset1.push_to_hub("username/dataset_name", "subset1")
dataset_subset2.push_to_hub("username/dataset_name", "subset2")

# later

dataset_subset1 = load_dataset("username/dataset_name", "subset1")
dataset_subset2 = load_dataset("username/dataset_name", "subset2")

Each subset can be a DatasetDict made of multiple splits, or you can upload one split at a time:

dataset_subset1_train.push_to_hub("username/dataset_name", "subset1", split="train")
dataset_subset1_test.push_to_hub("username/dataset_name", "subset1", split="test")

# later

dataset_subset1_train = load_dataset("username/dataset_name", "subset1", split="train")
dataset_subset1_test = load_dataset("username/dataset_name", "subset1", split="test")
4 Likes

Hi! @lhoestq I think I found a bug whenever you tried to overwrite what you have pushed before.
Could you check out my post please? Load_dataset() doesn’t load ONE of the Subset - Beginners - Hugging Face Forums

Seeing the decription from Manual Configuration (huggingface.co), you can just add README.md by huggingface UI

1 Like