I am interested in creating an AI tool that converts text into speech using a specific voice that I upload. To clarify, here is what I am aiming to achieve:
-
Input Specifications:
Text: For example,I love sleeping.
Voice Sample: A recording of someone’s voice saying something likeThis is your fault.
-
Output Specifications:
The output should be the textI love sleeping
spoken in the voice sample provided.
In other words, I want to create a text-to-speech system where the output text is spoken in the voice of the provided audio sample.
Questions:
- Is it feasible to build such an AI from scratch? What kind of technology or frameworks would be suitable for this project?
- What programming languages or tools would you recommend? I am open to suggestions but ideally, I would like to use something that is effective for AI tasks.
- Are there any existing resources or libraries that might help in building this type of voice morphing system?
As a precaution, I might not continue with this project if it is too complex.