How to load this simple audio data set and use dataset.map without memory issues?

lhoestq · May 16, 2022, 1:29pm

Can you check ds.cache_files ? Since you loaded the dataset from memory using .from_pandas, then the dataset has no associated cache directory to save intermediate results.

To fix this you can specify cache_file_name in .map(), this way it will write the results on your disk instead of using memory

Topic		Replies	Views
Running out of memory during dataset.map() with `AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")` Beginners	3	3576	June 8, 2022
Loading data from Datasets takes too much memory 🤗Datasets	2	559	January 18, 2024
Dataset map during runtime 🤗Datasets	2	1297	September 13, 2023
Misunderstanding around creating audio datasets from Local files 🤗Datasets	12	1765	July 17, 2023
.map - function overloads my Cache Beginners	3	207	August 21, 2023

How to load this simple audio data set and use dataset.map without memory issues?

Related topics