I processed the datasets into several shards. If I want to load them as one piece I can do concatenation but it will take some time to index all of the files. Is there a quicker way to load the dataset like a memory mapping from several dataset shards?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to save datasets as distributed with save_to_disk? | 1 | 2116 | November 15, 2022 | |
How to concatenate 100s of small datasets into a very large dataset? *Without* loading into memory? | 1 | 368 | May 18, 2023 | |
[Bug?] Datasets map and concatenation after sharding OOM | 1 | 13 | September 4, 2024 | |
`load_dataset` results in OOM | 0 | 120 | June 25, 2024 | |
How do I download and load a dataset in batches without caching all of it? | 1 | 45 | September 16, 2024 |