I processed the datasets into several shards. If I want to load them as one piece I can do concatenation but it will take some time to index all of the files. Is there a quicker way to load the dataset like a memory mapping from several dataset shards?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to save datasets as distributed with save_to_disk? | 1 | 2329 | November 15, 2022 | |
Using large dataset with accelerate | 0 | 25 | March 6, 2025 | |
How to concatenate 100s of small datasets into a very large dataset? *Without* loading into memory? | 1 | 408 | May 18, 2023 | |
[Bug?] Datasets map and concatenation after sharding OOM | 1 | 25 | September 4, 2024 | |
`load_dataset` results in OOM | 0 | 155 | June 25, 2024 |