How to load a large hf dataset efficiently?

Saving a dataset on HF using .push_to_hub() does upload multiple shards.
In particular it splits the dataset in shards of 500MB and uploads each shard as a Parquet file on HF.

It’s also possible to manually get a shard of a dataset using the .shard() method