Speakers Vaibhav (VB) is a consultant turned student researcher at University of Stuttgart, Germany. His current research is in the field of Performance Prediction for NLP models and Speech Synthesis. He is also an active volunteer with Europython and Python DE. LinkedIn: https://www.linkedin.com/in/vaibhavs10/
Vatsal left the world of mathematics in 2017 to dive into Speech Synthesis soon after he came across the WaveNet paper. His research has focused on Normalising Flows, a particular kind of Deep Generative Model. At Amazon, he researched the deep-learning based vocoding module that is used in production, and disentanglement in deep generative models for zero-shot speech generation (text-to-speech & voice conversion): publishing 4 papers, 5 patents, and developing multiple product proof-of-concepts. Beyond speech, Vatsal has also spent some time in a team of researchers focused on Bayesian Models/Sparse Gaussian Processes. LinkedIn: https://www.linkedin.com/in/vatsal-aggarwal-993472104/.
You can post all your questions in this topic! They will be answered during the session
If I want to train a TTS model on my own speech to imitate my voice, — what amount of records do I need to make for training a model with good quality?
Is English pretrained model good for fine-tuning on other languages? Or it’s necessary to find the model trained on the same language as the target one?
I don’t think so, I think should be better than starting from scratch (but you will most likely need to change the vocabulary). Finding a model for the language is always better.
Thank you for joining the stream yesterday.
I’m putting together the responses to the questions below:
Question: If I want to train a TTS model on my own speech to imitate my voice, — what amount of records do I need to make for training a model with good quality?
Question: Is English pretrained model good for fine-tuning on other languages? Or it’s necessary to find the model trained on the same language as the target one?
In case you have nothing else available, english model can be a good starting point or atleast would serve as a good initial benchmark. Typically finetuning models in the target language is desirable.