Hey! I spent some days trying to understand this, constantly getting OOM. And setting cache_file_name='test'
was a bit brittle, as it would just use that cache no matter the fingerprint.
It seems like the datasets.from_dict()
doesnt have any cache files, so I had to save to csv and then load with the csv-loader (which seemed to have some cache functionality):
pd.DataFrame({'id' : folders}).to_csv("file.csv", index=False)
ds_ids = datasets.Dataset.from_csv("file.csv")