for this split function to work, does the data set need to be in memory? I want to do this for streaming=True also.
It’s not possible yet to do it when streaming. You can still split your dataset at a certain index though
# use 100 first samples for testing
test, train = ds.take(100), ds.skip(100)