Hi,
What is the most efficient method of slicing a dataset? I have a dataset that only contains the input_ids and attention mask for evaluating my model, both as list (not Tensor). I need to split the dataset into chunks as my GPU Memory is not enough to fit everything in one go.
I tried ds.select(range(0,20))
and I tried ds[0:20]
. Both operations take 2 seconds and even If I increase the size from 20 to 50 elements it takes 5 seconds. So basically the time increases linearly with the number of elements.
Is there a way to slice a dataset that is more time efficient?
Thanks!