In-memory dataset to disk for caching operations

I am creating datasets from pandas dataframes a lot, as it’s simply an easier way to preserve columns that are arrays and such. However, I notice my .map operations aren’t getting cached.
How can I turn an in memory dataset to a disk-based one for caching (or directly load a dataframe as such)

Hi! In-memory datasets create a temporary cache bound to a python session. To cache operations permanently, save the dataset to disk with .save_to_disk("path/to/save/dir") and reload it with datasets.load_from_disk("path/to/save/dir") to get the version backed by an arrow file and then execute the ops on it again.

1 Like