How to use audio augmentations for audio classification

Hi, I am very new to hugging face and audio datasets

I have an audio dataset in a folder, which I loaded into the dataset.

(Dataset({
features: [‘audio’, ‘label’],
num_rows: 50
}),

above “audio” is a dict with keys sampling_rate, path, array
I am doing audio classification by using cast_column to Audio then used feature extractor and then finetuned the model. It works fine.

I noticed there is a class imbalance, I used to work with image data generator which provides data augmentation and handles class by generating augmented samples.

can we do something similar in huggingface dataset or how can I generate augmented data to handle low-number classes?

Thanks.

Hi, I have this question and I want to augment my audio dataset. Do you find answer for this question?
If you have, I’d greatly appreciate it if you could share your findings or any progress you’ve made.

same problem, could you find any solution?

I am also interested in this. Registering my interest to let people know this topic is in demand.