Load shards as one dataset

chenchaozhao · February 16, 2024, 3:47am

I processed the datasets into several shards. If I want to load them as one piece I can do concatenation but it will take some time to index all of the files. Is there a quicker way to load the dataset like a memory mapping from several dataset shards?

Topic		Replies	Views
How to concatenate 100s of small datasets into a very large dataset? Without loading into memory? 🤗Datasets	1	432	May 18, 2023
[urgent]Can you reconstruct datasets using the cache file (.arrow file)? 🤗Datasets	5	1074	August 27, 2021
How to save datasets as distributed with save_to_disk? 🤗Datasets	1	2470	November 15, 2022
`load_dataset` results in OOM 🤗Datasets	0	179	June 25, 2024
[Bug?] Datasets map and concatenation after sharding OOM 🤗Datasets	1	31	September 4, 2024

Load shards as one dataset

Related topics