How to load this simple audio data set and use dataset.map without memory issues?

akt42 · May 10, 2022, 9:47am

Thanks @lhoestq! I think there’s something wrong here. I’ve tried with a data set size of N=10_000 and it was always crashing on colab (~13 GB RAM) even with batch_size=1.

ds = ds.map(preprocess_function, remove_columns='audio', batch_size=1)

(My code provided is reproducible in the Colab free version with N=10000).

Another observation I’ve made is that the memory usage increases somewhat linearly when ds.map() is called. Could it be that it’s not garbage collecting?

Topic		Replies	Views
Running out of memory during dataset.map() with `AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")` Beginners	3	3576	June 8, 2022
Loading data from Datasets takes too much memory 🤗Datasets	2	559	January 18, 2024
Dataset map during runtime 🤗Datasets	2	1297	September 13, 2023
Misunderstanding around creating audio datasets from Local files 🤗Datasets	12	1765	July 17, 2023
.map - function overloads my Cache Beginners	3	207	August 21, 2023

How to load this simple audio data set and use dataset.map without memory issues?

Related topics