Hi,
is it possible to save a mapped dataset? Mapping takes too much time every time i run the program
Hi,
is it possible to save a mapped dataset? Mapping takes too much time every time i run the program
You can use ds.save_to_disk("path/to/save_dir")
Mapping takes too much time every time i run the program
Can you clarify what you mean by this? Does loading the dataset take a lot of time or something else?
Every time i run the program, it runs the map function again, instead of caching the mapped dataset and re-using the cached mapped dataset
This means the map transform does not produce deterministic hashes (or fn_kwargs
if you use them), so please specify cache_file_name
in the map
call to use that file for caching instead of the default caching mechanism.
You can also push_to_hub
your mapped dataset and then you can also query it directly from DuckDB
see this blogpost from yesterday: DuckDB: analyze 50,000+ datasets stored on the Hugging Face Hub