Is `dataset.select(range(10000))` efficient?

Is dataset.select(range(10000)) efficient?

Is this the best way to select a slice of the dataset?

Yes, a monotonically increasing range of numbers allows us to slice the underlying PyArrow table instead of generating an indices mapping (makes indexing slower).

1 Like