I am seeing different results when I do
dataset.map(..., batched=True, num_proc=4)
vs
dataset.map(..., batched=True, num_proc=16)
Here is the output:
Map (num_proc=4): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 500/500 [00:00<00:00, 1019.49 examples/s]
Map (num_proc=16): 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 500/500 [00:00<00:00, 1684.70 examples/s]
Dataset({
features: ['input_ids', 'attention_mask'],
num_rows: 148
}) Dataset({
features: ['input_ids', 'attention_mask'],
num_rows: 143
})
Is it expected to have different num_rows
in the output?