Hi! Instead of pushing each split separately, it’s better to create a DatasetDict
and push everything in a single call to push_to_hub
. You can do this as follows:
from datasets import DatasetDict
ddict = DatasetDict({
"split1": split1_ds, # split1_ds is an instance of `datasets.Dataset`
"split2": split2_ds,
"split3": split3_ds,
"split4": split4_ds,
})
ddict.push_to_hub("repo_id")
If you still want to push the sub-datasets separately, then make sure that the name of each split is unique (you can control this with the split
parameter in push_to_hub
) and that you use ignore_verifications=True
when loading the dataset from the Hub (required due to a known bug, will be fixed soon).