Hi, I have an audio data set of the following format, which has 16 kHz audio files in a one folder named “audio”  and a pandas dataframe of labels with audio to label mapping.
(Code to create this data set is at the end of this post)

Question:
What is the standard way to create a dataset from this data set to train an audio classification model?
More specifically, how can I use the facebook/hubert-large-ls960-ft feature extractor to create a Dataset to train a Hubert model? I have the additional requirements of truncating/padding input size to 10 seconds, which I’ve done in the preprocess_function below.
What I tried:
import numpy as np
import os
import pandas as pd
import soundfile as sf
from datasets import Dataset, Audio
from transformers import Wav2Vec2Processor
# creating the dataset from pandas
ds = Dataset.from_pandas(labels)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
# feature extractor
feature_extractor = Wav2Vec2Processor.from_pretrained("facebook/hubert-large-ls960-ft")
def preprocess_function(examples):
    audio_arrays = [examples['audio']['array']]
    inputs = feature_extractor(
        audio_arrays, 
        sampling_rate=16_000, 
        max_length=int(16_000 * 10),  # 10s
        truncation=True, 
    )
    return inputs
# map the preprocessing function
ds = ds.map(preprocess_function, remove_columns='audio')
This works fine when the data set is small. But fails when there are many audio files (N~10000) in the data set due to the map operation exhausting the memory. I’m probably doing something wrong because this clearly does not align with The magic of memory mapping . What am I doing wrong? Thanks!
Code to create the data set:
# number of examples
N = 10
# labels file
labels = pd.DataFrame({
    'audio': [os.path.join('audio_dir', f"{i}.wav") for i in range(N)],
    'label': np.random.choice(['A', 'B'], N)
})
# save dummy audio files
os.makedirs("audio_dir", exist_ok=True)
for file_path in labels['audio']:
    dummmy_audio = np.random.randn(np.random.choice(np.arange(80_000, 240_000)).astype(int))  # between 5s - 15s long
    sf.write(file_path, dummmy_audio, 16_000)
            