Wow! That has worked!
When I check ds.cache_files
, that returned an empty list.
Then I’ve tried with ds = ds.map(preprocess_function, remove_columns='audio', cache_file_name='test')
and it worked with no issues at all. Also then, the ds.cache_files
became [{'filename': 'test'}]
Thanks a lot for your help.
If you don’t mind me asking, how did you get this?
Since you loaded the dataset from memory using
.from_pandas
, then the dataset has no associated cache directory to save intermediate results.
I’ve read the docs for days but was never able to figure this out.