How to load this simple audio data set and use dataset.map without memory issues?

lhoestq · May 10, 2022, 9:04am

Hi ! This is a good way to define a dataset for audio classification

During map, only one batch at a time is loaded in memory and passed to your preprocess_function . To use less memory you can try to reduce the writer_batch_size (default is 1,000)

ds = ds.map(preprocess_function, remove_columns='audio', writer_batch_size=100)

EDIT: changed batch_size to writer_batch_size

Topic		Replies	Views
Running out of memory during dataset.map() with `AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")` Beginners	3	3576	June 8, 2022
Loading data from Datasets takes too much memory 🤗Datasets	2	559	January 18, 2024
Dataset map during runtime 🤗Datasets	2	1297	September 13, 2023
Misunderstanding around creating audio datasets from Local files 🤗Datasets	12	1765	July 17, 2023
.map - function overloads my Cache Beginners	3	207	August 21, 2023

How to load this simple audio data set and use dataset.map without memory issues?

Related topics