Can Wav2Vec2 distinguish music during speech-to-text?

aggonca · May 25, 2023, 1:14pm

Hi everyone,

I have a custom dataset where the test set contains music data corresponding to 10% of the test set. I want to label music data as “music” during performing speech-to-text. I am using the wav2vec2 model for speech-to-text and want to know if distinguishing music data from the text data during speech-to-text is possible to achieve with wav2vec2. I tried to do it but the lost value got started really high then decreased until zero after a while. If anyone can guide me through it, I will be appreciated.

Thanks!

anon32239754 · August 27, 2023, 3:17pm

my organisation in the trouble my whole business is based on STT I need more accurate stt seamless m4t is not able to convert any audio fully I have little bit of noises audios. I have testing platform for student who is preparing for ILETS, PTE TOFEL in that I need to take answer of given question by student in audio form and for evaluating their answer I need to fully accurate text of those audio so I can analyse their grammar, mistakes so here I have used whisper in frontend but problem is whisper is doing auto correction and sometime stocking on one word and repeating again and again. I have used web speech api as well but it get stuck in between. I have huge amount of transcription thing in a month approx 80000 hours/ month.

Topic		Replies	Views
Wav2vec For Music Applications (generation, captioning, instrument classification) Flax/JAX Projects	2	1503	July 3, 2021
Wav2vec2 finetuning and language model Beginners	0	213	October 1, 2023
Wav2vec2 and whisper ASR live streaming Models	1	763	May 15, 2023
Wav2Vec2 For Swedish 🤗Transformers	6	953	March 17, 2021
Speech to Text concern 🤗Transformers	0	385	August 27, 2023

Can Wav2Vec2 distinguish music during speech-to-text?

Related topics