How to load this simple audio data set and use dataset.map without memory issues?

Hey! I spent some days trying to understand this, constantly getting OOM. And setting cache_file_name='test' was a bit brittle, as it would just use that cache no matter the fingerprint.

It seems like the datasets.from_dict() doesnt have any cache files, so I had to save to csv and then load with the csv-loader (which seemed to have some cache functionality):

    pd.DataFrame({'id' : folders}).to_csv("file.csv", index=False)
    ds_ids = datasets.Dataset.from_csv("file.csv")
1 Like