Datasets.load_dataset not returning 'eval' or 'test'

I am using load_dataset to load my data which is basically files stored in a directory:

I load the data with:
dataset = load_dataset(path='/Users/petar/Documents/bert/data', split='train')

Basically, only “train” split is available and my data is stored there. I also would like to get a eval or test split so I can do “per_device_eval_batch_size” in TrainingArguments. How can I make that split and specify split size?

Hi @petarulev ! You can use Datasets train_test_split function to create and adjust the size of each split:

dataset.train_test_split(test_size=0.2)

Check out the docs here :slight_smile:

2 Likes

Thank you! :slight_smile: