How to split Hugging Face dataset to train and test?

stevhliu · July 26, 2022, 4:13pm

Hello and welcome @laro1!

You can use the train_test_split() function and specify the test_size parameter to determine the size of the split. For example:

ds.train_test_split(test_size=0.3)

DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'label'],
        num_rows: 525
    })
    test: Dataset({
        features: ['premise', 'hypothesis', 'label'],
        num_rows: 225
    })
})

Check out the docs here and let me know if that helps!

Topic		Replies	Views
Confusion in splitting dataset (from imagefolder) into train, test and validation 🤗Datasets	2	5727	August 12, 2022
How do I split a Dataset with only train to train/test? Beginners	1	453	February 21, 2022
How to split main dataset into train, dev, test as DatasetDict 🤗Datasets	21	42482	May 23, 2024
AttributeError: 'DatasetDict' object has no attribute 'train_test_split' 🤗Datasets	4	19903	August 5, 2023
Train_test_split with a dataset loaded from dict Beginners	1	645	November 9, 2022

How to split Hugging Face dataset to train and test?

Related topics