I’d like to introduce an open source ML storage framework (GitHub - google/space: Unified storage framework for the entire machine learning lifecycle) that provides data manipulation, materialized views, version management features to popular ML datasets.
It supports lightweight conversion to/from HuggingFace datasets by reusing Parquet files. You can use it to easily modify or incrementally transform data. Here is an example: space/notebooks/huggingface_conversion.ipynb at main · google/space · GitHub
Your feedback will be very helpful. Thank you!