How to load this simple audio data set and use dataset.map without memory issues?

Hi ! This is a good way to define a dataset for audio classification :slight_smile:

During map, only one batch at a time is loaded in memory and passed to your preprocess_function . To use less memory you can try to reduce the writer_batch_size (default is 1,000) :wink:

ds = ds.map(preprocess_function, remove_columns='audio', writer_batch_size=100)

EDIT: changed batch_size to writer_batch_size

1 Like