ASTFeatureExtractor

tirengarfio · June 13, 2025, 8:52am

Hi,

I’m working in a Master’s Dissertation to predict music popularity using AST model.

I’m looking now at the ASTFeatureExtractor here: Audio Spectrogram Transformer that converts audio raw files to Mel spectrograms.

Looks like ‘max_length’ parameter of ASTFeatureExtractor default value is 1024. To me, 1024 means that only the first 10.24 seconds of each song will be inserted to the model. Anyone can confirm that?

Regards

John6666 · June 13, 2025, 1:01pm

I think it’s probably about right. Maybe changing the hop will make a difference.

n /your_dataset/run.sh, you need to specify the data json file path. You need to set dataset_mean and dataset_std, if don’t know, you can use our AudioSet stats (mean=-4.27, std=4.57); You need to set audio_length, which should be the number of frames (e.g., with a 10ms hop, 10-second audio=1000 frames); You need to set the metrics in [acc,mAP] and loss in [CE,BCE]; You need to set the inital learning rate lr and learning rate scheduler lrscheduler_{start,step,decay}; You also need to set the SpecAug parameters (freqm and timem, we recommend to mask 48 frequency bins out of 128, and 20% of your time frames), the mixup rate (i.e., how many samples are mixup samples), batch size, etc. While it seems a lot, it is easy if you start with one of our recipe: ast/egs/[audioset,esc50,speechcommands]/run.sh].

Topic		Replies	Views
ASR inference time too long Beginners	1	310	February 25, 2021
The size of tensor a (146) must match the size of tensor b (1214) at non-singleton dimension 1 🤗Transformers	0	379	November 8, 2023
Help in finetuning ASR models Beginners	3	538	January 13, 2023
Asymmetry in validation step vs. autoregressive inference 🤗Transformers	0	179	December 5, 2023
Wav2Vec 2 audio processing Models	0	141	June 3, 2024

ASTFeatureExtractor

Related topics