Lately, I’ve been exploring artificial intelligence used to create music, separate instrumentals from songs, among other things. Because of that, I became interested in training my own AI, where the main idea is for it to receive an audio input (preferably vocals only) and transform the input voice into a fixed target voice.
I work in data science, but my focus has been mostly on natural language models, so I wanted to see if there’s anyone willing to give me some tips about the audio field, haha
What are the most commonly used models in this area?
When training the model, would it be ideal to generate a dataset with the target voice in various scenarios, or could I limit it to just a “spoken” dataset, for example?
If it’s text to voice, I think RVC2 is the most major one.
There are all sorts of resources on external sites.
I’m not too familiar with voice changer systems, but I think it’s in the collection. What’s the name of the model…?
Well, if you find a space with a similar purpose and look in requirements.txt, you should be able to find it.
But be aware that many of the voice-related libraries are from the last year. Build errors tend to occur quite often.
So, I’m not very familiar with voice-changing systems either. What I’m testing are online software tools, like Suno and Moises. From using them, I became interested in practicing some finetuning in this area.
And thank you very much for your response, @John6666
Now with all these places for me to check out, I’ll probably be able to explore a lot o/
I really appreciate it, this will definitely help!
And that guy, Suno, is out of this world. But getting something 30% close to that is already excellent.