How to load this simple audio data set and use dataset.map without memory issues?

akt42 · May 21, 2022, 5:06am

Wow! That has worked!

When I check ds.cache_files, that returned an empty list.

Then I’ve tried with ds = ds.map(preprocess_function, remove_columns='audio', cache_file_name='test') and it worked with no issues at all. Also then, the ds.cache_files became [{'filename': 'test'}]

Thanks a lot for your help.

If you don’t mind me asking, how did you get this?

Since you loaded the dataset from memory using .from_pandas , then the dataset has no associated cache directory to save intermediate results.

I’ve read the docs for days but was never able to figure this out.

Topic		Replies	Views
Running out of memory during dataset.map() with `AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")` Beginners	3	3576	June 8, 2022
Loading data from Datasets takes too much memory 🤗Datasets	2	559	January 18, 2024
Dataset map during runtime 🤗Datasets	2	1297	September 13, 2023
Misunderstanding around creating audio datasets from Local files 🤗Datasets	12	1765	July 17, 2023
.map - function overloads my Cache Beginners	3	207	August 21, 2023

How to load this simple audio data set and use dataset.map without memory issues?

Related topics