Just curious- how do I create a train test split from a dataset that doesn’t have a length function? I don’t want to download & tokenize the whole dataset before I split it into training and testing.
Hi! I think the only option is to sample the input dataset while iterating over it (e.g., in the training loop) to generate the test split.
1 Like