Arrowmemoryerror: realloc of size 32 GB failed

I am trying to encode an image dataset of size 20000 images. But when the encoding reaches at 69% irrespective of the batch size it generates an error displaying realloc of size 32 gb failed. I even tried to increase the batch size but the error still persists. Pyarrow version that is installed is 10.0.1, datasets version is 2.7.1.

Hi ! How do you create your dataset ?

Right now every dataset is loaded from disk using memory mapping to not fill your RAM. However datasets created from in-memory data currently stay in memory.

So if you used Dataset.from_dict for example you may want to write your dataset to disk to avoid filling up your RAM. You can use ds.save_to_disk() and reload it with load_form_disk() before calling your map function

Thank you @lhoestq for the Solution. Will definitely try it if it works.