Dataset map() creates lot of cache files

I am loading a csv file(about 1.5G) from disk using load_dataset(). It creates files under cache directory.

dataset = load_dataset(‘csv’, data_files=filepath)

When we apply map functions on the datasets like below, the cache size keeps growing

df= df.map(preprocess_1, num_cores=8)
df= df.map(preprocess_2, num_cores=8)

Is there a way to disable caching on each map() function applied.

I tried to disable caching at the datasets level using the following, but it still creates cache files.

from datasets import disable_caching
disable_caching()

Is there any solution/workaround to disable caching?

It does create files because it writes the resulting dataset on your disk to reload it from there using memory mapping (and save some RAM). To keep your dataset in memory instead, you can pass keep_in_memory=True to map

When caching is disabled, dataset files are written to temporary directories.

Thanks, @lhoestq