Streaming for Saving

sinamoeini · January 26, 2025, 10:30am

Hi,
I am looking for a way to download a large dataset, transform it and then upload it to another location. Note that the transformations for each instance is independent of others.

I can load the dataset in streaming mode and start the transformation but cannot find a way to write to huggingface hub (in batches) in as download and transformation are ongoing. Wondering if such pattern exists

John6666 · January 26, 2025, 11:47am

In the case of datasets library’s push_to_hub, I think you couldn’t upload the data unless all of it was available…
If the files are outputted frequently, in the worst case, there is a way to manually upload them one after another using HfApi…

Topic		Replies	Views
Download rows directly with API Beginners	1	39	September 12, 2024
Streaming in dataset uploads 🤗Datasets	2	58	March 31, 2025
Standard way to upload huge dataset 🤗Datasets	5	609	April 26, 2024
Unable to upload large audio dataset using push_to_hub 🤗Datasets	5	858	November 17, 2023
Accessing local data files 🤗Datasets	1	534	September 23, 2022

Streaming for Saving

Related topics