Seeking guidance on building a text-to-speech AI with custom voice morphing

I am interested in creating an AI tool that converts text into speech using a specific voice that I upload. To clarify, here is what I am aiming to achieve:

  1. Input Specifications:
    Text: For example, I love sleeping.
    Voice Sample: A recording of someone’s voice saying something like This is your fault.

  2. Output Specifications:
    The output should be the text I love sleeping spoken in the voice sample provided.

In other words, I want to create a text-to-speech system where the output text is spoken in the voice of the provided audio sample.

Questions:

  • Is it feasible to build such an AI from scratch? What kind of technology or frameworks would be suitable for this project?
  • What programming languages or tools would you recommend? I am open to suggestions but ideally, I would like to use something that is effective for AI tasks.
  • Are there any existing resources or libraries that might help in building this type of voice morphing system?

As a precaution, I might not continue with this project if it is too complex.