Hi there,
I am wondering, what is currently the most elegant way to perform a three-way random split (into train, val and test set)? Let’s assume I load_dataset
so that:
Dataset({
features: ['text'],
num_rows: 19122
})
Subsequently, I’d like to perform the split. Currently I am performing dataset.train_test_split()
twice and then recombine the three datasets into one using DatasetDict
. However, I assume that this is not the most elegant approach right? I also experimented with ReadInstructions
, however, I could only split the data deterministically instead of randomly…
Any one got a better soultion?