How to use audio augmentations for audio classification

Hi, I am very new to hugging face and audio datasets

I have an audio dataset in a folder, which I loaded into the dataset.

features: [‘audio’, ‘label’],
num_rows: 50

above “audio” is a dict with keys sampling_rate, path, array
I am doing audio classification by using cast_column to Audio then used feature extractor and then finetuned the model. It works fine.

I noticed there is a class imbalance, I used to work with image data generator which provides data augmentation and handles class by generating augmented samples.

can we do something similar in huggingface dataset or how can I generate augmented data to handle low-number classes?