AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

Hi @thecity2, as far as I know train_test_split operates on Dataset objects, not DatasetDict objects.

For example, this works

squad = (load_dataset('squad', split='train')
        .train_test_split(train_size=800, test_size=200))

because I’ve picked the train split and so load_dataset returns a Dataset object. On the other hand, this does not work:

squad = load_dataset('squad').train_test_split(train_size=800, test_size=200)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-d3fb264651eb> in <module>
----> 1 squad = load_dataset('squad').train_test_split(train_size=800, test_size=200)

AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

It seems that your load_dataset is returning the latter, so you could try applying train_test_split on one of the Dataset objects that lives in your dataset.

15 Likes