How do i load part of the data set

First, load the dataset from hf, then you can select how much rows you need.
For example:

from datasets import load_dataset
dataset_name = "Helsinki-NLP/un_pc"
dataset = load_dataset(dataset_name, split="train")
train_dataset = dataset.select(range(200000))

Note that the full dataset will be downloaded on your computer but only the selected 200k rows will be the train_dataset value

1 Like