[Bug?] Datasets map and concatenation after sharding OOM

When using

dataset.map(myfunc, num_proc=16,  
    keep_in_memory=False, 
    cache_file_name='parts.arrow', 
    batch_size=16, writer_batch_size=16
)

Due to the size of my dataset, it results in:

/site-packages/datasets/table.py:1421:
  table = cls._concat_blocks(blocks, axis=0)
Killed

Looking carefully at the code in .map, I see that the shards are being created and the .arrow files exists after the progress bar goes to 100%, then at it goes OOM at datasets/src/datasets/arrow_dataset.py at 2.21.0 · huggingface/datasets · GitHub

            logger.info(f"Concatenating {num_proc} shards")
            result = _concatenate_map_style_datasets(transformed_shards)

Not sure if it’s a bug that writing to individual arrow files work but the combination of the shard fails.

Any way to resolve this and avoid the OOM in the _concat_blocks function?

Also asked on concatenation - How to resolve OOM when .map concatenate the sharded parts? - Stack Overflow