How to split main dataset into train, dev, test as DatasetDict

for this split function to work, does the data set need to be in memory? I want to do this for streaming=True also.

It’s not possible yet to do it when streaming. You can still split your dataset at a certain index though

# use 100 first samples for testing
test, train = ds.take(100), ds.skip(100)