How to load a large hf dataset efficiently?

@lhoestq what do you think about loading this dataset in a lazy way, for example loading it as a batch of 64 samples at a time? Is it more inefficient than loading those shards of data?