How to save a mapped dataset

Hi,

is it possible to save a mapped dataset? Mapping takes too much time every time i run the program

You can use ds.save_to_disk("path/to/save_dir")

Mapping takes too much time every time i run the program

Can you clarify what you mean by this? Does loading the dataset take a lot of time or something else?

1 Like

Every time i run the program, it runs the map function again, instead of caching the mapped dataset and re-using the cached mapped dataset :frowning:

This means the map transform does not produce deterministic hashes (or fn_kwargs if you use them), so please specify cache_file_name in the map call to use that file for caching instead of the default caching mechanism.

You can also push_to_hub your mapped dataset and then you can also query it directly from DuckDB

see this blogpost from yesterday: DuckDB: analyze 50,000+ datasets stored on the Hugging Face Hub

2 Likes