I’m currently making use of gcsfuse to load data from arrow tables stored in a Google Cloud Storage bucket mounted to a vm folder. The tables were saved automatically when the map() function cached the datasets to the mounted folder which is all well and good but gcsfuse often struggles with large datasets (io errors, sluggish, etc.).
Is there a way of using the equivalent of save_to_disk for the map() function’s caching? That way I could use the s3 filesystem to load the tables in a more stable fashion.