Wav2vec is SoTA for ASR. It will be interesting to explore the model or other transformer architectures for Music AI applications. We can focus on of the following depending on bandwidth:
Instrument classification
Vocal separation or instrument segmentation
Emotion/rhythm/pitch analysis
Pitch shift
One thing I am curious about is:
If we train a music to lyrics model, what are the features being learnt by the layers? Can we fine-tune such a model on the other downstream tasks?