Yes correct, filter()
only stores the indices to save disk space.
For people who want to rewrite the dataset completely (e.g. to end up with contiguous data and get faster reads), there is ds.flatten_indices()
that rewrites the dataset and removes the indices mapping