Does `Dataset.map(..., batched=True, batch_size=N)` save the original order?

Hi. I have a dataset:

Dataset({
    features: ['text', 'request_index'],
    num_rows: 1000
})

The dataset contains 1000 rows for N request_index. I want to build embeddings using batched Dataset.map:

def _get_embeddings(self, texts: t.List[str]) -> t.Dict[str, Tensor]:
    encoded_input = self.tokenizer(texts, padding=True, truncation=True, return_tensors='pt')

    with torch.no_grad():
        encoded_input = { k: v.to(self.device) for k, v in encoded_input.items() }
        model_output = self.model(**encoded_input)

    return model_output.pooler_output.tolist()

predictions = dataset.map(
    lambda x: {
        'embeddings': self._get_embeddings(x['text']),
        'request_index': x['request_index'],
    },
    batched=True,
    batch_size=4,)

After that I have to group embeddings by request_index:

{
    0: [ embedding1, embedding2, ... ],
    1: [ embedding3 ],
    ...
}

The problem is I couldn’t find any information about order in the dataset after batched map method.

As the documentation says batched map method calls map callback for each batch in parallel. So I’m not sure that every time I will get the same order as in the original dataset.

Yes, map always preserves the original order, even with num_proc > 1.