Wav2Vec2 for Audio Emotion Classification

Winstead · March 11, 2021, 3:00pm

We are having a thesis project on Podcast Trailer Generation - Hotspot Detection for Podcast Dataset at Spotify.

The Spotify Podcast Dataset contains both transcript and audio data for many podcast episodes, and currently we are looking to use Wav2Vec2 embeddings as input to train an emotion classification model for the audio data. The audio data is currently only in English (with accompanied transcript).

It would be much appreciated if you could help out with fine-tuning Wav2Vec2 on some standard emotion-annotated audio datasets (e.g. RAVDESS, SAVEE). We will then use the fine-tuned embeddings as input for emotion classification, after which we will have human evaluation on the classified results.

birgermoell · March 11, 2021, 3:14pm

That sounds great. I’m also working with fine-tuning Wav2Vec2. I can help you out if you have any questions. @patrickvonplaten is also a great person to ask.

patrickvonplaten · March 15, 2021, 6:04am

Hey @Winstead,

Thanks for your post here! I think it would be a good idea to use Wav2Vec2 for emotion classification. I won’t find time to fine-tune the model myself any time soon, but it should be rather straightforward to do so. Things that need to be done before being able to fine-tune Wav2Vec2 on emotion classification.

- Add a Wav2Vec2ForSpeechClassification model that would be very similar to how BertForSequenceClassification is implemented.
- It would be probably much easier to train the model if the two datasets you linked above would be added to datasets. It should be rather straight-forward to do this yourself or else you can put up a dataset request issue in the library and maybe someone in the open-source community is interested in tackling the issue. See those issues for example: Issues · huggingface/datasets · GitHub
Having added those things it should be rather straight-forward to train the model. You can look for “transformers” sentiment analysis online to get a feeling for how it should be done. See this article for example: https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/

othrif · April 26, 2021, 12:48pm

@Winstead, have you made any progress on this?

I am also working on a similar project and will be happy to collaborate/assist.

Winstead · April 26, 2021, 3:07pm

Hi @othrif !

I’ve written the basic code and trained on some emotion-annotated speech dataset, but the accuracy has not been good so far. As I’m also working on several other approaches in the meantime and haven’t spent a lot of time on fine-tuning wav2vec2, I believe there is a lot of room for improvement.

And yes, I would be glad to collaborate/discuss on this. How would you prefer to communicate?

m3hrdadfi · May 25, 2021, 8:28am

@Winstead, It would probably solve your problem.

Winstead · May 26, 2021, 3:50pm

@m3hrdadfi Awesome! Thank you so much!

Topic		Replies	Views
Using Wav2Vec in speech classification/regression problems Languages at Hugging Face	13	9589	November 16, 2022
Wav2vec For Music Applications (generation, captioning, instrument classification) Flax/JAX Projects	2	1503	July 3, 2021
I want to custom my data set in speech recognition wav2vec Beginners	1	828	August 9, 2021
Wav2Vec2 For Swedish 🤗Transformers	6	953	March 17, 2021
Fine-tuning Whisper for Audio Classification Models	6	3256	November 8, 2024

Wav2Vec2 for Audio Emotion Classification

Related topics