Most efficient way to retrieve N rows for a subset of columns

Hello,

I would like to retrieve rows from a dataset using a sequence of indexes as efficiently as possible. Each row contains many fields, so I would like to query the Arrow table for a subset of columns in order to exploit at best the column format.

My current method is the following:

def retrieve_rows(dataset: Dataset, indexes: Iterable[int], keys:List[str]):
    """Retrieved n rows from the `dataset` for the specific keys."""
    if keys is not None and len(keys) == 1:
        key = keys[0]
        retrieved_rows = map(dataset[key].__getitem__, indexes)
        retrieved_rows = [{key: x} for x in retrieved_docs]
    else:
        retrieved_rows = map(dataset.__getitem__, indexes)
        # filter keys
        retrieved_docs = [{k: v for k, v in row.items() if keys is None or k in keys} for row in retrieved_rows]
   return retrieved_docs

Limitations

However, it comes with two limitations

  1. for len(keys)==1, the whole column is loaded.
  2. for len(keys)>1, all columns are queried.

Dataset comes with a select method, but this create a new Dataset object, which seems quite cumbersome for my use case.

Questions

So my questions are:
a. how to query rows for a subset of columns
b. how to batch queries (or using an iterator of idx)
c. or alternatively, is it possible to return the Arrow table directly, so I can fine-tune the queries?