How to load this simple audio data set and use dataset.map without memory issues?

Wow! That has worked!

When I check ds.cache_files, that returned an empty list.

Then I’ve tried with ds = ds.map(preprocess_function, remove_columns='audio', cache_file_name='test') and it worked with no issues at all. Also then, the ds.cache_files became [{'filename': 'test'}]

Thanks a lot for your help.

If you don’t mind me asking, how did you get this?

Since you loaded the dataset from memory using .from_pandas , then the dataset has no associated cache directory to save intermediate results.

I’ve read the docs for days but was never able to figure this out.

3 Likes