I am loading a csv file(about 1.5G) from disk using load_dataset(). It creates files under cache directory.
dataset = load_dataset(‘csv’, data_files=filepath)
When we apply map functions on the datasets like below, the cache size keeps growing
df= df.map(preprocess_1, num_cores=8)
df= df.map(preprocess_2, num_cores=8)
Is there a way to disable caching on each map() function applied.
I tried to disable caching at the datasets level using the following, but it still creates cache files.
from datasets import disable_caching
disable_caching()
Is there any solution/workaround to disable caching?