if a dataset is stored as parquets, loading with huggingface load_dataset, and then shuffled, does this mean that batches contain rows from several files? Or does it only shuffle the order of reading parquet files?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to make a infinite dataloader with shuffle with huggingface datasets | 2 | 780 | August 9, 2024 | |
Desired behavior when calling `shuffle` or `select` on `interleave_datasets` | 1 | 390 | July 20, 2021 | |
Extremely slow data loading of imagefolder | 9 | 1969 | January 4, 2024 | |
Loading a large parquet dataset with varying image resolutions | 2 | 43 | October 24, 2024 | |
Enabling dataset viewer by coexistence of loading script and parquet files | 5 | 263 | March 18, 2024 |