Hi @thecity2, as far as I know train_test_split
operates on Dataset
objects, not DatasetDict
objects.
For example, this works
squad = (load_dataset('squad', split='train')
.train_test_split(train_size=800, test_size=200))
because I’ve picked the train
split and so load_dataset
returns a Dataset
object. On the other hand, this does not work:
squad = load_dataset('squad').train_test_split(train_size=800, test_size=200)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-d3fb264651eb> in <module>
----> 1 squad = load_dataset('squad').train_test_split(train_size=800, test_size=200)
AttributeError: 'DatasetDict' object has no attribute 'train_test_split'
It seems that your load_dataset
is returning the latter, so you could try applying train_test_split
on one of the Dataset
objects that lives in your dataset
.