How to use MFCC feature extraction method while fine-tuning the pretrained model?

gaurav123 · December 16, 2021, 9:33pm

I am following the below blog post for fine-tuning the pretrained model on my custom dataset. Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers. In the blog, the author mentioned about where we can use it. Below is an excerpt for the same.

First, we load and resample the audio data, simply by calling 'batch["audio"]' . Second, we extract the 'input_values' from the loaded audio file. In our case, the Wav2Vec2Processor only normalizes the data. For other speech models, however, this step can include more complex feature extraction, such as [Log-Mel feature extraction](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum). Third, we encode the transcriptions to label ids.

Below is the section where the change is supposed to be done.(AFAIK)

def prepare_dataset(batch):
    audio = batch["audio"]

    # batched output is "un-batched"
    batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
    batch["input_length"] = len(batch["input_values"])
    
    with processor.as_target_processor():
        batch["labels"] = processor(batch["sentence"]).input_ids
    return batch

The change which I made is this line
batch["input_values"] = librosa.feature.mfcc(audio["array"], n_mfcc=13, sr=audio["sampling_rate"])

However, this doesn’t seem to work. Can somebody help me out? Thanks.

hiba2 · April 30, 2023, 9:49pm

I have exactly the same problem, can somebody help me please?

Zahra99 · May 7, 2024, 2:53pm

Hi,
Did you find the answer for your question? I have the same question.

Topic		Replies	Views
Different versions of 'wav2vec2' model and their differences Beginners	1	1496	August 7, 2021
Fine-tune Wav2Vec2ForCTC from pre-finedtuned model Beginners	0	384	January 23, 2022
Fine-tuning Whisper for Audio Classification Models	6	3244	November 8, 2024
Wav2Vec2 pretraining feature extraction during preprocessing as welll as training 🤗Transformers	1	731	October 1, 2022
Original and re-loaded model are not the same Beginners	0	462	August 14, 2021

How to use MFCC feature extraction method while fine-tuning the pretrained model?

Related topics