Downloading a portion of parquet files

Using streaming mode, you can download only what you requested, e.g.

fineweb = load_dataset("tiiuae/falcon-refinedweb", split="train", streaming=True)

# skip the first 10 samples and then take only the first 10, resulting in [10:20]
subset = list(fineweb.skip(10).take(10))
1 Like