Hi ! This is a good way to define a dataset for audio classification
During map
, only one batch at a time is loaded in memory and passed to your preprocess_function
. To use less memory you can try to reduce the writer_batch_size
(default is 1,000)
ds = ds.map(preprocess_function, remove_columns='audio', writer_batch_size=100)
EDIT: changed batch_size
to writer_batch_size