How to load this simple audio data set and use dataset.map without memory issues?

akt42 · May 10, 2022, 11:24am

Thanks @lhoestq, unfortunately, it’s the same even when I try with the smallest possible values for N=10000. Could it be that I’m making some mistake somewhere else in my code (I mean the provided minimal example).

ds = ds.map(
    preprocess_function,
    remove_columns='audio',
    batch_size=1,
    writer_batch_size=1
)

Topic		Replies	Views
Running out of memory during dataset.map() with `AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")` Beginners	3	3576	June 8, 2022
Loading data from Datasets takes too much memory 🤗Datasets	2	559	January 18, 2024
Dataset map during runtime 🤗Datasets	2	1297	September 13, 2023
Misunderstanding around creating audio datasets from Local files 🤗Datasets	12	1765	July 17, 2023
.map - function overloads my Cache Beginners	3	207	August 21, 2023

How to load this simple audio data set and use dataset.map without memory issues?

Related topics