Save `DatasetDict` to HuggingFace Hub

Hi there,

I prepared my data into a DatasetDict object that I saved to disk with the save_to_disk method. I’d like to upload the generated folder to the HuggingFace Hub and use it using the usual load_dataset function. Though, I have not yet found a way to do so. Is this possible?

Thanks a lot in advance for your help.

Best,
Pietro

Hi,

this week’s release of datasets will add support for directly pushing a Dataset/DatasetDict object to the Hub. In the meantime, you can use a to_{format} method, where format is one of ["csv", "json", "txt", "parquet"] on each split of the DatasetDict object and push the generated files to the Hub (follow the docs here for more information). Also note that this requires the master version of the library, which you can install with:

pip install git+https://github.com/huggingface/datasets.git

Without the master version, you’ll have to specify a list of files to load each split separately (docs on that are here).

1 Like

Hi @mariosasko,

Thanks a lot for your answer! I will try this out later and let you know how it goes. Excited about the new upcoming feature :slight_smile:

Best,
Pietro