Is dataset.select(range(10000))
efficient?
Is this the best way to select a slice of the dataset?
Is dataset.select(range(10000))
efficient?
Is this the best way to select a slice of the dataset?
Yes, a monotonically increasing range of numbers allows us to slice the underlying PyArrow table instead of generating an indices mapping (makes indexing slower).