I’m defining my own dataset. To do this I follow the tutorial of the docs, and create a dataset loading script. (see the docs)
But I’m facing an issue : my data is located in a single file, and I would like to split this data into train
and test
subsets.
As far as I understand, it’s not possible.
In _split_generators()
method, since I have a single file, I can assign it only to a single SplitGenerator
…
As an alternative, I made a single split in my dataset loading script, and tried to call train_test_split()
deterministically, but even when fixing the random seed, it gives different results everytime…
PS : I know I could just split my single file, unfortunately I don’t have control over that file…