Datasets.load_dataset not returning 'eval' or 'test'

petarulev · May 16, 2022, 12:32pm

I am using load_dataset to load my data which is basically files stored in a directory:

I load the data with:
dataset = load_dataset(path='/Users/petar/Documents/bert/data', split='train')

Basically, only “train” split is available and my data is stored there. I also would like to get a eval or test split so I can do “per_device_eval_batch_size” in TrainingArguments. How can I make that split and specify split size?

stevhliu · May 16, 2022, 4:59pm

Hi @petarulev ! You can use Datasets train_test_split function to create and adjust the size of each split:

dataset.train_test_split(test_size=0.2)

Check out the docs here

petarulev · May 17, 2022, 7:02am

Thank you!

Topic		Replies	Views
How to split Hugging Face dataset to train and test? 🤗Datasets	5	55135	January 24, 2023
Load_dataset split='test' not working 🤗Datasets	2	886	February 8, 2024
Load_dataset assumes 'train' Beginners	2	932	May 31, 2023
Confusion in splitting dataset (from imagefolder) into train, test and validation 🤗Datasets	2	5729	August 12, 2022
AttributeError: 'DatasetDict' object has no attribute 'train_test_split' 🤗Datasets	4	19939	August 5, 2023

Datasets.load_dataset not returning 'eval' or 'test'

Related topics