I processed the datasets into several shards. If I want to load them as one piece I can do concatenation but it will take some time to index all of the files. Is there a quicker way to load the dataset like a memory mapping from several dataset shards?
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to save datasets as distributed with save_to_disk? | 1 | 1856 | November 15, 2022 | |
How to concatenate 100s of small datasets into a very large dataset? *Without* loading into memory? | 1 | 353 | May 18, 2023 | |
`load_dataset` results in OOM | 0 | 86 | June 25, 2024 | |
Dataset only have n_shard=1 when has multiple shards in repo | 1 | 1260 | July 1, 2022 | |
Working with large datasets | 5 | 3636 | November 10, 2020 |