How do i load part of the data set

sKlklkjjkj · April 30, 2025, 6:34pm

I have big dataset un_pc

I want to load part like 200 k rows for start how do i do this

John6666 · April 30, 2025, 9:07pm

I wonder if streaming would be a good option…

Victorano · May 2, 2025, 10:19pm

First, load the dataset from hf, then you can select how much rows you need.
For example:

from datasets import load_dataset
dataset_name = "Helsinki-NLP/un_pc"
dataset = load_dataset(dataset_name, split="train")
train_dataset = dataset.select(range(200000))

Note that the full dataset will be downloaded on your computer but only the selected 200k rows will be the train_dataset value

Zelgodiz · May 5, 2025, 4:10am

You can load a subset of your dataset in Hugging Face using the load_dataset() function with filtering options. Here are a few ways to do it:

1. Using `.select()` to Load a Specific Number of Rows

If your dataset is already loaded, you can select 200,000 rows like this:

from datasets import load_dataset

dataset = load_dataset("your_dataset_name")
subset = dataset["train"].select(range(200000))  # Select first 200k rows

2. Using Streaming to Avoid Full Download

If your dataset is too large to fit in memory, use streaming mode:

dataset = load_dataset("your_dataset_name", split="train", streaming=True)
subset = dataset.take(200000)  # Load only 200k rows

This prevents downloading the entire dataset at once.

3. Using `data_files` to Load a Specific File

If your dataset consists of multiple files, you can load only a specific portion:

dataset = load_dataset("your_dataset_name", data_files={"train": "train_part1.csv"})

For more details, check out the Hugging Face documentation or community discussions. Let me know if you need help with a specific dataset!

Topic		Replies	Views
Best practices for a large dataset 🤗Datasets	7	1489	May 6, 2025
Loading just part of dataset 🤗Datasets	4	4792	February 25, 2025
Big text dataset loading for training 🤗Datasets	2	115	May 7, 2025
Streaming in dataset uploads 🤗Datasets	2	55	March 31, 2025
Download only a subset of a split 🤗Datasets	10	16682	February 25, 2025

How do i load part of the data set

1. Using .select() to Load a Specific Number of Rows

2. Using Streaming to Avoid Full Download

3. Using data_files to Load a Specific File

Related topics

1. Using `.select()` to Load a Specific Number of Rows

3. Using `data_files` to Load a Specific File