What’s the definiation of lazy loading? Do the IterableDataset and Dataset decided whether there is the lazy loading? I think lazy loading is that we don’t load all the data at the same time. So only we used IterableDataset , lazy loading will happen.
Another question comes out. Does IterableDataset use memory-mapping and zero-copy to retrive data? Will both IterableDataset and Dataset occupy the same RAM when loading the same datasets? If we just retrive data without shuffle and locally, the speed differece between IterableDataset and Dataset is because contiguous sequential access is faster than random access, right?
And also I noticed huge virtual memory(around 100G, and my dataset is also around 100G) is occupied when I use load_from_disk or load_dataset without streaming to load .arrow files. Is that normal? I see the blog, and for my understanding, zero_copy utilizes the virtual memory indeed, and the size of VM is related to the size of datasets, right?