Finetuning Wave2Vec vs. Finetuning Distilbert

slyle · May 29, 2023, 6:44pm

https://colab.research.google.com/github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb#scrollTo=pFSqZ0jwCMSv

I have been following the above documentation but find it seems to be out of date in a few ways. I have been running into issues right around the point where he sets a bunch of jagged ndarrays in the preprocess map function.

When I finetuned a distillbert model on IMDB reviews, the process was much simpler, as I was just able to load the model in, specify a different number of classes, and then finetune it using a trainer object.

Can someone explain to me:

Is there an updated way to finetune an audio model for speech emotion analysis?
Why do the processes so drastically differ between the audio and text sentiment models?
Why do we have to attach a completely different classifier head as is done in the link I provided above via torch? I was able to do the entire finetuning for the text model in transformers only.

Thanks in advance for the information, any information about how things have changed since the above documentation was created really helps (Python 3.7 was used, so it was probably awhile ago).

slyle · May 31, 2023, 3:55pm

Posting here as I found a better guide that’s a bit more up to date:

https://towardsdatascience.com/fine-tuning-hubert-for-emotion-recognition-in-custom-audio-data-using-huggingface-c2d516b41cd8

It appears that the reason why it was easier to use Distilbert and this guide above is that additional classes have been created that attach a different classifier head to the model for you. Be careful with using old documentation/tutorials, as this package appears to update quite frequently, making old more deliberate methods obsolete (and making our lives easier!)

Topic		Replies	Views
Wav2vec2 finetuning custom dataset 🤗Transformers	2	2449	December 25, 2024
Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering 🤗Transformers	19	12850	February 12, 2024
Fine-tuning Whisper for Audio Classification Models	6	3277	November 8, 2024
0% accuracy when finetuning from certain models. [CLS] token embeddings not learned 🤗Transformers	1	609	November 2, 2023
Wav2vec2 not converging when finetuning 🤗Transformers	7	2542	June 15, 2021

Finetuning Wave2Vec vs. Finetuning Distilbert

Related topics