How to split main dataset into train, dev, test as DatasetDict

From now on it is written like this:

from datasets import DatasetDict
#split into test, train, val

90% train, 10% test + validation

train_testvalid = dataset.train_test_split(test_size=0.1)

Split the 10% test + valid in half test, half valid

test_valid = train_testvalid[‘test’].train_test_split(test_size=0.5)

gather everyone if you want to have a single DatasetDict

train_test_valid_dataset = DatasetDict({
‘train’: train_testvalid[‘train’],
‘test’: test_valid[‘test’],
‘valid’: test_valid[‘train’]})

train_test_valid_dataset

1 Like