Wav2vec For Music Applications (generation, captioning, instrument classification)

tanmaylaud · June 23, 2021, 6:28pm

Wav2vec is SoTA for ASR. It will be interesting to explore the model or other transformer architectures for Music AI applications. We can focus on of the following depending on bandwidth:
Instrument classification
Vocal separation or instrument segmentation
Emotion/rhythm/pitch analysis
Pitch shift

One thing I am curious about is:
If we train a music to lyrics model, what are the features being learnt by the layers? Can we fine-tune such a model on the other downstream tasks?

patrickvonplaten · June 24, 2021, 8:29am

Hey @tanmaylaud,

This paper: [2105.01051] SUPERB: Speech processing Universal PERformance Benchmark might be of interest. I’m also currently working on adding Wav2Vec2 in Flax, see: [WIP][Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub. Hopefully this will be merged by next week.

mbien · July 3, 2021, 8:27pm

Hi, I recently ran a semester project in musicology and tried to use wav2vec for music popularity prediction. Unfortunately, without success, but maybe you can find the code useful for other purposes: DH-401/milestone3-wav2vec2.ipynb at main · Glorf/DH-401 · GitHub (project description is in the repo)

Topic		Replies	Views
PreTrain Wav2Vec2 in German Flax/JAX Projects	7	1364	July 7, 2021
PreTrain Wav2Vec2 in Spanish Flax/JAX Projects	4	627	July 1, 2021
Wav2Vec2 for Audio Emotion Classification 🤗Transformers	6	8177	May 26, 2021
Can Wav2Vec2 distinguish music during speech-to-text? Models	1	349	August 27, 2023
PreTrain Wav2Vec2 in Indonesian Flax/JAX Projects	1	366	June 29, 2021

Wav2vec For Music Applications (generation, captioning, instrument classification)

Related topics