How to Add New Data to an Existing Parquet Dataset?

I am importing an image dataset from an external source that is several terabytes in size. In the future, I will need to update this dataset by adding new files.

I found that I can achieve this simply by placing the new Parquet files in the same folder as the existing ones while keeping the column names consistent.

Is there a way to append-only uploads using the datasets library?

1 Like

Features like incremental upload may still be in the works…